DocumentCode
2990920
Title
A Method for New Word Extraction on Chinese Large-scale Query Logs
Author
Sun, Rui ; Jin, Peng ; Lai, Juan
Author_Institution
Lab. of Intell. Inf. Process. & Applic., Leshan Normal Univ., Leshan, China
fYear
2011
fDate
3-4 Dec. 2011
Firstpage
1256
Lastpage
1259
Abstract
Chinese word segmentation is a base, difficult and important problem in natural language processing. New word is a bottleneck to Chinese word segmentation. The query log of search engine contains user habits and behaviors. There are many popular words and professional terms in these logs. We present a new word extraction method, which uses the grammar rules and statistics information. The experiment was run on the Sogou query logs. Our method can achieve the precision rate of 59.9%, the recall rate of 44.1%, and the F-measure of 50.8%. Experimental results show this method has strong extensibility and good effect, and provide guarantee for improving the performance of word segment system.
Keywords
grammars; natural language processing; query processing; search engines; statistical analysis; Chinese large scale query logs; Chinese word segmentation; F-measure; Sogou query logs; grammar rules; natural language processing; new word extraction; search engine; statistics information; Data mining; Educational institutions; Grammar; Laboratories; Natural language processing; Search engines; Semantics; New Word Extraction; Query Log; Search Engine;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
Conference_Location
Hainan
Print_ISBN
978-1-4577-2008-6
Type
conf
DOI
10.1109/CIS.2011.278
Filename
6128319
Link To Document