Title :
Naive Bayes Modeling with Proper Smoothing for Information Extraction
Author :
Gu, Zhenmei ; Cercone, Nick
Author_Institution :
Waterloo Univ., Waterloo
Abstract :
Information extraction (IE) summarizes a collection of documents into a structural representation by identifying specific facts from text. The naive Bayes model is one of the first statistical models that have been applied to IE for learning extraction patterns from labeled data. In spite of the simplicity and popularity of the naive Bayes model, we have observed a formulation problem in previous work on naive Bayes IE. In this paper, we present a formal naive Bayes modeling for IE, by which the derived formula for the filler probability estimation is more theoretically sound. We also address smoothing techniques in order to overcome the data sparseness problem. Our proposed smoothing strategy is shown to be critical to the robustness of a naive Bayes IE system. Experimental results show that our naive Bayes IE systems achieve better extraction performance compared to related work.
Keywords :
Bayes methods; information retrieval; learning (artificial intelligence); probability; smoothing methods; statistical analysis; text analysis; AI learning; data sparseness problem; document summarization; filler probability estimation; information extraction; labeled data; naive Bayes modeling; smoothing technique; statistical model; structural representation; text analysis; Bayesian methods; Computer science; Data mining; Estimation theory; Filling; Machine learning; Power system modeling; Power system reliability; Robustness; Smoothing methods;
Conference_Titel :
Fuzzy Systems, 2006 IEEE International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9488-7
DOI :
10.1109/FUZZY.2006.1681742