DocumentCode
3622710
Title
Comparison of collocation extraction measures for document indexing
Author
S. Petrovic;J. Snajder;B. Dalbelo-Basic;M. Kolar
Author_Institution
Fac. of Electr. Eng. & Comput., Zagreb Univ.
fYear
2006
fDate
6/28/1905 12:00:00 AM
Firstpage
451
Lastpage
456
Abstract
Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words
Keywords
"Indexing","Data mining","Natural language processing","Law","Legal factors","Statistics","Computational linguistics","Stock markets","Cancer","Guns"
Publisher
ieee
Conference_Titel
Information Technology Interfaces, 2006. 28th International Conference on
ISSN
1330-1012
Print_ISBN
953-7138-05-4
Type
conf
DOI
10.1109/ITI.2006.1708523
Filename
1708523
Link To Document