• DocumentCode
    2346702
  • Title

    Sense-based clustering of Polish nouns in the extraction of semantic relatedness

  • Author

    Broda, Bartosz ; Piasecki, Maciej ; Szpakowicz, Stanislaw

  • Author_Institution
    Inst. of Appl. Inf., Wroclaw Univ. of Technol., Wroclaw
  • fYear
    2008
  • fDate
    20-22 Oct. 2008
  • Firstpage
    83
  • Lastpage
    89
  • Abstract
    The construction of a wordnet from scratch requires intelligent software support. An accurate measure of semantic relatedness can be used to extract groups of semantically close words from a corpus. Such groups help a lexicographer make decisions about synset membership and synset placement in the network. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on the largest Polish corpus available. The evaluation by way of a plWordNet-based synonymy test used Polish WordNet, a resource still under development. The results are consistent with a few benchmarks, but not encouraging enough yet to make a wordnet writer´s support tool immediately useful.
  • Keywords
    natural language processing; software engineering; Polish WordNet; Polish nouns; intelligent software support; lexicographer; plWordNet-based synonymy test; semantic relatedness extraction; sense-based clustering; synset membership; synset placement; wordnet; Benchmark testing; Clustering algorithms; Computer science; Data mining; Helium; Informatics; Information technology; Large-scale systems; Mutual information; Software algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
  • Conference_Location
    Wisia
  • Print_ISBN
    978-83-60810-14-9
  • Type

    conf

  • DOI
    10.1109/IMCSIT.2008.4747222
  • Filename
    4747222