• DocumentCode
    3622710
  • Title

    Comparison of collocation extraction measures for document indexing

  • Author

    S. Petrovic;J. Snajder;B. Dalbelo-Basic;M. Kolar

  • Author_Institution
    Fac. of Electr. Eng. & Comput., Zagreb Univ.
  • fYear
    2006
  • fDate
    6/28/1905 12:00:00 AM
  • Firstpage
    451
  • Lastpage
    456
  • Abstract
    Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words
  • Keywords
    "Indexing","Data mining","Natural language processing","Law","Legal factors","Statistics","Computational linguistics","Stock markets","Cancer","Guns"
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces, 2006. 28th International Conference on
  • ISSN
    1330-1012
  • Print_ISBN
    953-7138-05-4
  • Type

    conf

  • DOI
    10.1109/ITI.2006.1708523
  • Filename
    1708523