Title :
Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer
Author :
Wang, Jun ; Ge, Ning
Author_Institution :
Inf. Manage. Dept., Peking Univ., Beijing
Abstract :
ADL Gazetteer is a digitalized worldwide gazetteer developed in the Alexandria Digital Library (ADL) Project, which contains millions of geographic names (placenames). The placenames are indexed with type terms from the ADL feature type thesaurus (FTT), a hierarchical category scheme. The paper proposes a two-step method to enrich the category scheme automatically: to discover frequent generic terms by detecting phase boundaries with a mutual information-based method, and to correlate the generic terms with the relevant type terms by hierarchical clustering. The correlation pair established can then be used to supplement the FTT with the generic terms found. The extensive experiments conducted on millions of ADLG placenames demonstrated the effectiveness of the proposed methods. Besides the thesaurus enrichment, the potential applications of this research include: to suggest likely type terms when categorizing new placenames, and to help users choose likely search terms
Keywords :
classification; digital libraries; geographic information systems; thesauri; ADL Gazetteer; ADL feature type thesaurus; ADLG placenames; Alexandria Digital Library; FTT hierarchical category scheme; automatic feature thesaurus enrichment; digital gazetteer; generic term extraction; geographic names; hierarchical clustering; Data mining; Dictionaries; Feature extraction; Indexing; Information management; Permission; Phase detection; Phase frequency detector; Software libraries; Thesauri; automatic gazetteer updating; correlation analysis; digital gazetteer; generic term extraction;
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
DOI :
10.1145/1141753.1141828