• DocumentCode
    441984
  • Title

    Multistage Chinese collocation extraction

  • Author

    Xu, Rui-Feng ; Lu, Qin

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., China
  • Volume
    5
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    3254
  • Abstract
    Collocation is a recurrent and conventional natural language expression. In this research, Chinese collocations are categorized into four types. Based on the statistical analysis of different types of typical collocations, a multi-stage window-based collocation extraction system is designed, in which lexical statistic, synonyms information, syntactic information, and dependency knowledge, are used to extract n-gram collocations and different types of bi-gram collocations separately. Experimental results show that this system achieves a better precision and recall performance, compared with existed statistical collocation extraction techniques.
  • Keywords
    computational linguistics; natural languages; statistical analysis; dependency knowledge; lexical statistic; multistage Chinese collocation extraction; natural language expression; natural language processing; statistical analysis; statistical collocation extraction techniques; synonyms information; syntactic information; Computational linguistics; Data mining; Frequency; Information retrieval; Natural language processing; Natural languages; Statistical analysis; Statistical distributions; Statistics; Testing; Collocation extraction; multi-stage extraction; natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527504
  • Filename
    1527504