Sense-based clustering of Polish nouns in the extraction of semantic relatedness

Author

Broda, Bartosz ; Piasecki, Maciej ; Szpakowicz, Stanislaw

Author_Institution

Inst. of Appl. Inf., Wroclaw Univ. of Technol., Wroclaw

fYear

2008

fDate

20-22 Oct. 2008

Firstpage

83

Lastpage

89

Abstract

The construction of a wordnet from scratch requires intelligent software support. An accurate measure of semantic relatedness can be used to extract groups of semantically close words from a corpus. Such groups help a lexicographer make decisions about synset membership and synset placement in the network. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on the largest Polish corpus available. The evaluation by way of a plWordNet-based synonymy test used Polish WordNet, a resource still under development. The results are consistent with a few benchmarks, but not encouraging enough yet to make a wordnet writer´s support tool immediately useful.

Keywords

natural language processing; software engineering; Polish WordNet; Polish nouns; intelligent software support; lexicographer; plWordNet-based synonymy test; semantic relatedness extraction; sense-based clustering; synset membership; synset placement; wordnet; Benchmark testing; Clustering algorithms; Computer science; Data mining; Helium; Informatics; Information technology; Large-scale systems; Mutual information; Software algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on

Conference_Location

Wisia

Print_ISBN

978-83-60810-14-9

Type

conf

DOI

10.1109/IMCSIT.2008.4747222

Filename

4747222