Title :
A Method for New Word Extraction on Chinese Large-scale Query Logs
Author :
Sun, Rui ; Jin, Peng ; Lai, Juan
Author_Institution :
Lab. of Intell. Inf. Process. & Applic., Leshan Normal Univ., Leshan, China
Abstract :
Chinese word segmentation is a base, difficult and important problem in natural language processing. New word is a bottleneck to Chinese word segmentation. The query log of search engine contains user habits and behaviors. There are many popular words and professional terms in these logs. We present a new word extraction method, which uses the grammar rules and statistics information. The experiment was run on the Sogou query logs. Our method can achieve the precision rate of 59.9%, the recall rate of 44.1%, and the F-measure of 50.8%. Experimental results show this method has strong extensibility and good effect, and provide guarantee for improving the performance of word segment system.
Keywords :
grammars; natural language processing; query processing; search engines; statistical analysis; Chinese large scale query logs; Chinese word segmentation; F-measure; Sogou query logs; grammar rules; natural language processing; new word extraction; search engine; statistics information; Data mining; Educational institutions; Grammar; Laboratories; Natural language processing; Search engines; Semantics; New Word Extraction; Query Log; Search Engine;
Conference_Titel :
Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
Conference_Location :
Hainan
Print_ISBN :
978-1-4577-2008-6
DOI :
10.1109/CIS.2011.278