Comparison of collocation extraction measures for document indexing

Author

S. Petrovic;J. Snajder;B. Dalbelo-Basic;M. Kolar

Author_Institution

Fac. of Electr. Eng. & Comput., Zagreb Univ.

fYear

2006

fDate

6/28/1905 12:00:00 AM

Firstpage

451

Lastpage

456

Abstract

Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words

Keywords

"Indexing","Data mining","Natural language processing","Law","Legal factors","Statistics","Computational linguistics","Stock markets","Cancer","Guns"

Publisher

ieee

Conference_Titel

Information Technology Interfaces, 2006. 28th International Conference on

ISSN

1330-1012

Print_ISBN

953-7138-05-4

Type

conf

DOI

10.1109/ITI.2006.1708523

Filename

1708523

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3622710