DocumentCode :
2347784
Title :
Automatic filtration of multiword units
Author :
Liu, Ying ; Tie, Zheng
Author_Institution :
Dept. of Chinese Language & Literature, Tsinghua Univ., Beijing, China
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
4
Abstract :
This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
Keywords :
information filtering; text analysis; automatic filtration; multiword units; normalized expectation; patent corpus; Irrigation; Presses; contextual entropy; extract; filtrate; multiword unit;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587783
Filename :
5587783
Link To Document :
بازگشت