DocumentCode
538059
Title
Evaluation of clustering algorithms for Polish Word Sense Disambiguation
Author
Broda, Bartosz ; Mazur, Wojciech
Author_Institution
Inst. of Inf., Wroclaw Univ. of Technol., Wrocław, Poland
fYear
2010
fDate
18-20 Oct. 2010
Firstpage
25
Lastpage
32
Abstract
Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. Thus, this work focuses on evaluation of a few selected clustering algorithms in task of Word Sense Disambiguation for Polish. We tested 6 clustering algorithms (K-Means, K-Medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, Growing Hierarchical Self Organising Maps, graph-partitioning based clustering) and five weighting schemes. For agglomerative and divisive algorithm 13 criterion function were tested. The achieved results are interesting, because best clustering algorithms are close in terms of cluster purity to precision of supervised clustering algorithm on the same dataset, using the same features.
Keywords
natural language processing; pattern clustering; text analysis; Polish; clustering algorithms; text analysis; word sense disambiguation; Algorithm design and analysis; Clustering algorithms; Context; Feature extraction; Mutual information; Neurons; Partitioning algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location
Wisla
ISSN
2157-5525
Print_ISBN
978-1-4244-6432-6
Type
conf
DOI
10.1109/IMCSIT.2010.5679861
Filename
5679861
Link To Document