• DocumentCode
    2990920
  • Title

    A Method for New Word Extraction on Chinese Large-scale Query Logs

  • Author

    Sun, Rui ; Jin, Peng ; Lai, Juan

  • Author_Institution
    Lab. of Intell. Inf. Process. & Applic., Leshan Normal Univ., Leshan, China
  • fYear
    2011
  • fDate
    3-4 Dec. 2011
  • Firstpage
    1256
  • Lastpage
    1259
  • Abstract
    Chinese word segmentation is a base, difficult and important problem in natural language processing. New word is a bottleneck to Chinese word segmentation. The query log of search engine contains user habits and behaviors. There are many popular words and professional terms in these logs. We present a new word extraction method, which uses the grammar rules and statistics information. The experiment was run on the Sogou query logs. Our method can achieve the precision rate of 59.9%, the recall rate of 44.1%, and the F-measure of 50.8%. Experimental results show this method has strong extensibility and good effect, and provide guarantee for improving the performance of word segment system.
  • Keywords
    grammars; natural language processing; query processing; search engines; statistical analysis; Chinese large scale query logs; Chinese word segmentation; F-measure; Sogou query logs; grammar rules; natural language processing; new word extraction; search engine; statistics information; Data mining; Educational institutions; Grammar; Laboratories; Natural language processing; Search engines; Semantics; New Word Extraction; Query Log; Search Engine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
  • Conference_Location
    Hainan
  • Print_ISBN
    978-1-4577-2008-6
  • Type

    conf

  • DOI
    10.1109/CIS.2011.278
  • Filename
    6128319