DocumentCode :
2710568
Title :
MWUs Extraction Based on Continuous Measurement of Inter-word Association with Frequency Adjustment
Author :
Wang, Zhifei ; Chen, Yue ; Jiang, XiaoYu
fYear :
2010
fDate :
7-10 May 2010
Firstpage :
647
Lastpage :
651
Abstract :
Extracting Multi-Word Units (MWUs) from raw text is a significant problem in natural language processing due to MWUs describe concept more accurate than single word. The statistical methods such as Mutual Information, Log-Likelihood Ratio and Chi-Squared test etc., rely on frequency of words extremely because the component words of MWUs tend to co-occur more often, and that the main components of multi-word phrase are the core terms in the text document. These core terms have a very high frequency generally and their word-building powers are very strong, so the frequency of these core terms is far higher than other component words of MWUs, and thus reduce the accuracy of the method. We proposed a method to adjust the frequency of the core words. Experimental results show that the method significantly improved the recall of the multi-word combinations and preserving the precision.
Keywords :
natural language processing; statistical analysis; text analysis; word processing; MWU extraction; continuous measurement; frequency adjustment; interword association; multiword combination; multiword unit; natural language processing; statistical method; text document; word building power; Data mining; Filtering; Frequency measurement; Large-scale systems; Mutual information; Natural language processing; Natural languages; Ontologies; Statistical analysis; Testing; Association Measurement; Frequency Adjustment; MWUs Extraction; Mutual Information; Term Extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Research and Development, 2010 Second International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-0-7695-4043-6
Type :
conf
DOI :
10.1109/ICCRD.2010.140
Filename :
5489550
Link To Document :
بازگشت