• DocumentCode
    538059
  • Title

    Evaluation of clustering algorithms for Polish Word Sense Disambiguation

  • Author

    Broda, Bartosz ; Mazur, Wojciech

  • Author_Institution
    Inst. of Inf., Wroclaw Univ. of Technol., Wrocław, Poland
  • fYear
    2010
  • fDate
    18-20 Oct. 2010
  • Firstpage
    25
  • Lastpage
    32
  • Abstract
    Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. Thus, this work focuses on evaluation of a few selected clustering algorithms in task of Word Sense Disambiguation for Polish. We tested 6 clustering algorithms (K-Means, K-Medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, Growing Hierarchical Self Organising Maps, graph-partitioning based clustering) and five weighting schemes. For agglomerative and divisive algorithm 13 criterion function were tested. The achieved results are interesting, because best clustering algorithms are close in terms of cluster purity to precision of supervised clustering algorithm on the same dataset, using the same features.
  • Keywords
    natural language processing; pattern clustering; text analysis; Polish; clustering algorithms; text analysis; word sense disambiguation; Algorithm design and analysis; Clustering algorithms; Context; Feature extraction; Mutual information; Neurons; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
  • Conference_Location
    Wisla
  • ISSN
    2157-5525
  • Print_ISBN
    978-1-4244-6432-6
  • Type

    conf

  • DOI
    10.1109/IMCSIT.2010.5679861
  • Filename
    5679861