• DocumentCode
    1827153
  • Title

    Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer

  • Author

    Wang, Jun ; Ge, Ning

  • Author_Institution
    Inf. Manage. Dept., Peking Univ., Beijing
  • fYear
    2006
  • fDate
    38869
  • Firstpage
    326
  • Lastpage
    333
  • Abstract
    ADL Gazetteer is a digitalized worldwide gazetteer developed in the Alexandria Digital Library (ADL) Project, which contains millions of geographic names (placenames). The placenames are indexed with type terms from the ADL feature type thesaurus (FTT), a hierarchical category scheme. The paper proposes a two-step method to enrich the category scheme automatically: to discover frequent generic terms by detecting phase boundaries with a mutual information-based method, and to correlate the generic terms with the relevant type terms by hierarchical clustering. The correlation pair established can then be used to supplement the FTT with the generic terms found. The extensive experiments conducted on millions of ADLG placenames demonstrated the effectiveness of the proposed methods. Besides the thesaurus enrichment, the potential applications of this research include: to suggest likely type terms when categorizing new placenames, and to help users choose likely search terms
  • Keywords
    classification; digital libraries; geographic information systems; thesauri; ADL Gazetteer; ADL feature type thesaurus; ADLG placenames; Alexandria Digital Library; FTT hierarchical category scheme; automatic feature thesaurus enrichment; digital gazetteer; generic term extraction; geographic names; hierarchical clustering; Data mining; Dictionaries; Feature extraction; Indexing; Information management; Permission; Phase detection; Phase frequency detector; Software libraries; Thesauri; automatic gazetteer updating; correlation analysis; digital gazetteer; generic term extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Chapel Hill, NC
  • Print_ISBN
    1-59593-354-9
  • Type

    conf

  • DOI
    10.1145/1141753.1141828
  • Filename
    4119149