Title :
Automatic filtration of multiword units
Author :
Liu, Ying ; Tie, Zheng
Author_Institution :
Dept. of Chinese Language & Literature, Tsinghua Univ., Beijing, China
Abstract :
This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
Keywords :
information filtering; text analysis; automatic filtration; multiword units; normalized expectation; patent corpus; Irrigation; Presses; contextual entropy; extract; filtrate; multiword unit;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587783