DocumentCode :
3473051
Title :
Comparison of generality based algorithm variants for automatic taxonomy generation
Author :
Henschel, Andreas ; Woon, Wei Lee ; Wächter, Thomas ; Madnick, Stuart
Author_Institution :
Masdar Inst. of Sci. & Technol., Abu Dhabi, United Arab Emirates
fYear :
2009
fDate :
15-17 Dec. 2009
Firstpage :
160
Lastpage :
164
Abstract :
We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymann-algorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies.
Keywords :
entropy; filtering theory; graph theory; pattern classification; Heymann algorithm; automatic taxonomy generation; betweenness centrality; closeness centrality; entropy-based filter; generality based algorithm variant; systematic threshold evaluation; threshold fine-tuning; unweighted similarity graph; variant complexity; Benchmark testing; Biomedical measurements; Databases; Frequency; Gold; Iterative algorithms; Mesh generation; Ontologies; Performance analysis; Taxonomy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovations in Information Technology, 2009. IIT '09. International Conference on
Conference_Location :
Al Ain
Print_ISBN :
978-1-4244-5698-7
Type :
conf
DOI :
10.1109/IIT.2009.5413365
Filename :
5413365
Link To Document :
بازگشت