Title :
SVM-based Hybrid Pattern for New Word Discovery
Author :
YANG, Hui ; Zhang, Yue-jie ; Zhang, Tao
Author_Institution :
Fudan Univ., Shanghai
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
New words bring more challenges into Chinese word segmentation. This paper presents a SVM-based hybrid pattern for new word discovery, trying to integrate the advantages of the statistics-based method and the rule-based method to improve the performance of the new word discovery. In the statistics module, new words discovery is defined as a binary classification problem, in which we considered the previous new word features and proposed context information and affix information as new features, as well as constraints, which reveal the relationships among the new word candidates. Finally, some rules are introduced aimed to improve the performance. In the experiment, some new words are simulated by revising the dictionary of a Natural Language Processing (NLP) system. The results show that these features and constraints are useful for new word discovery, and the F-measure is 64.62% which is 7% higher than the baseline.
Keywords :
data mining; dictionaries; natural language processing; pattern classification; support vector machines; Chinese word segmentation; F-measure; SVM-based hybrid pattern; binary classification problem; dictionary; natural language processing; new word discovery; rule-based method; statistics-based method; Computer science; Dictionaries; Finance; Frequency; Information management; Information processing; Laboratories; Probability; Statistics; Support vector machines;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368013