• DocumentCode
    2261117
  • Title

    Extending WordNet with compound nouns for semi-automatic annotation in data integration systems

  • Author

    Beneventano, Domenico ; Bergamaschi, Sonia ; Sorrentino, Serena

  • Author_Institution
    DII, Univ. of Modena & Reggio Emilia, Modena, Italy
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The focus of data integration systems is on producing a comprehensive global schema successfully integrating data from heterogeneous data sources (heterogeneous in format and in structure). Starting from the ldquomeaningsrdquo associated to schema elements (i.e. class/attribute labels) and exploiting the structural knowledge of sources, it is possible to discover relationships among the elements of different schemata. Lexical annotation is the explicit inclusion of the ldquomeaningrdquo of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotator tools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationships. In this paper we propose a new method for the annotation of non-dictionary compound nouns, which draws its inspiration from works in the natural language disambiguation area. The method extends the lexical annotation module of the MOMIS data integration system.
  • Keywords
    computational linguistics; natural language processing; text analysis; WordNet; data integration system; heterogeneous data sources; natural language disambiguation; nondictionary compound nouns; schemata; semiautomatic lexical annotation; structural knowledge; Databases; Humans; Logic; Natural languages; Terminology; Thesauri; Compound noun; WordNet; annotation; data integration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313842
  • Filename
    5313842