DocumentCode
2710568
Title
MWUs Extraction Based on Continuous Measurement of Inter-word Association with Frequency Adjustment
Author
Wang, Zhifei ; Chen, Yue ; Jiang, XiaoYu
fYear
2010
fDate
7-10 May 2010
Firstpage
647
Lastpage
651
Abstract
Extracting Multi-Word Units (MWUs) from raw text is a significant problem in natural language processing due to MWUs describe concept more accurate than single word. The statistical methods such as Mutual Information, Log-Likelihood Ratio and Chi-Squared test etc., rely on frequency of words extremely because the component words of MWUs tend to co-occur more often, and that the main components of multi-word phrase are the core terms in the text document. These core terms have a very high frequency generally and their word-building powers are very strong, so the frequency of these core terms is far higher than other component words of MWUs, and thus reduce the accuracy of the method. We proposed a method to adjust the frequency of the core words. Experimental results show that the method significantly improved the recall of the multi-word combinations and preserving the precision.
Keywords
natural language processing; statistical analysis; text analysis; word processing; MWU extraction; continuous measurement; frequency adjustment; interword association; multiword combination; multiword unit; natural language processing; statistical method; text document; word building power; Data mining; Filtering; Frequency measurement; Large-scale systems; Mutual information; Natural language processing; Natural languages; Ontologies; Statistical analysis; Testing; Association Measurement; Frequency Adjustment; MWUs Extraction; Mutual Information; Term Extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Research and Development, 2010 Second International Conference on
Conference_Location
Kuala Lumpur
Print_ISBN
978-0-7695-4043-6
Type
conf
DOI
10.1109/ICCRD.2010.140
Filename
5489550
Link To Document