Evaluation of clustering algorithms for Polish Word Sense Disambiguation

Author

Broda, Bartosz ; Mazur, Wojciech

Author_Institution

Inst. of Inf., Wroclaw Univ. of Technol., Wrocław, Poland

fYear

2010

fDate

18-20 Oct. 2010

Firstpage

25

Lastpage

32

Abstract

Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. Thus, this work focuses on evaluation of a few selected clustering algorithms in task of Word Sense Disambiguation for Polish. We tested 6 clustering algorithms (K-Means, K-Medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, Growing Hierarchical Self Organising Maps, graph-partitioning based clustering) and five weighting schemes. For agglomerative and divisive algorithm 13 criterion function were tested. The achieved results are interesting, because best clustering algorithms are close in terms of cluster purity to precision of supervised clustering algorithm on the same dataset, using the same features.

Keywords

natural language processing; pattern clustering; text analysis; Polish; clustering algorithms; text analysis; word sense disambiguation; Algorithm design and analysis; Clustering algorithms; Context; Feature extraction; Mutual information; Neurons; Partitioning algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on

Conference_Location

Wisla

ISSN

2157-5525

Print_ISBN

978-1-4244-6432-6

Type

conf

DOI

10.1109/IMCSIT.2010.5679861

Filename

5679861