DocumentCode
2347784
Title
Automatic filtration of multiword units
Author
Liu, Ying ; Tie, Zheng
Author_Institution
Dept. of Chinese Language & Literature, Tsinghua Univ., Beijing, China
fYear
2010
fDate
21-23 Aug. 2010
Firstpage
1
Lastpage
4
Abstract
This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
Keywords
information filtering; text analysis; automatic filtration; multiword units; normalized expectation; patent corpus; Irrigation; Presses; contextual entropy; extract; filtrate; multiword unit;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-6896-6
Type
conf
DOI
10.1109/NLPKE.2010.5587783
Filename
5587783
Link To Document