Title :
Unsupervised text pattern learning using minimum description length
Author :
Wu, Ke ; Yu, Jiangsheng ; Wang, Hanpin ; Cheng, Fei
Author_Institution :
Dept. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
The knowledge of text patterns in a domain-specific corpus is valuable in many natural language processing (NLP) applications such as information extraction, question-answering system, and etc. In this paper, we propose a simple but effective probabilistic language model for modeling the in-decomposability of text patterns. Under the minimum description length (MDL) principle, an efficient unsupervised learning algorithm is implemented and the experiment on an English critical writing corpus has shown promising coverage of patterns compared with human summary.
Keywords :
computational linguistics; learning (artificial intelligence); natural language processing; probability; text analysis; English; critical writing corpus; minimum description length; natural language processing; probabilistic language model; text pattern; unsupervised learning; Computational linguistics; Dictionaries; Humans; Merging; Probabilistic logic; Unsupervised learning;
Conference_Titel :
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-7821-7
DOI :
10.1109/IUCS.2010.5666227