• DocumentCode
    3399117
  • Title

    Fuzzy Pseudo-Thesaurus Based Clustering of a Folkloristic Corpus

  • Author

    Szaszkó, S. ; Kóczy, L.T. ; Gedeon, T.D.

  • Author_Institution
    Budapest Univ. of Technol. & Econ.
  • fYear
    2005
  • fDate
    25-25 May 2005
  • Firstpage
    126
  • Lastpage
    131
  • Abstract
    Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the word frequency degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed (Chakrabarty, et al. (1999))
  • Keywords
    fuzzy set theory; fuzzy systems; information retrieval; text analysis; thesauri; automatic thesaurus extraction; document clustering; document word pair cooccurrence; folkloristic corpus; fuzzy information retrieval; fuzzy pseudothesaurus based clustering; fuzzy thesaurus; synonymy; word frequency degree; Data mining; Dictionaries; Frequency estimation; Frequency measurement; Fuzzy systems; Humans; Information retrieval; Natural languages; Terminology; Thesauri;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2005. FUZZ '05. The 14th IEEE International Conference on
  • Conference_Location
    Reno, NV
  • Print_ISBN
    0-7803-9159-4
  • Type

    conf

  • DOI
    10.1109/FUZZY.2005.1452380
  • Filename
    1452380