DocumentCode :
519722
Title :
Useful attributes identification for Unsupervised Information Extraction result set based on REAdaBoost Naïve Bayes
Author :
Yin, Wenke ; Zhu, Ming
Author_Institution :
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
Volume :
1
fYear :
2010
fDate :
21-24 May 2010
Abstract :
Unsupervised Information Extraction has attracted great attentions in the literature. However, it is inevitable to include useless noise in the result set. Besides, the proportion of useful attributes and the noise in the result set is greatly imbalanced, and the importance of these two types of data is also different. So how to effectively identify the useful attributes becomes an open question. To address this problem, this paper proposes a revised AdaBoost algorithm-REAdaBoost. The weight coefficient of REAdaBoost is not only decided by the precision of useful attributes, but also correlates with the recall for rare attributes. We use Naïve Bayes as the base classifier, and then apply AdaBoost and REAdaBoost to boost it separately. The experiment results show that on the premise of not increasing the overall error rate, REAdaBoost has better performance than AdaBoost and Naïve Bayes in predicting both the useful attributes and the rare attributes.
Keywords :
Bayes methods; data mining; pattern classification; AdaBoost algorithm; REAdaBoost naive Bayes; attributes identification; unsupervised information extraction; weight coefficient; 1f noise; Automation; Background noise; Data mining; Error analysis; Explosives; Internet; Large-scale systems; Web pages; Web sites; Classification; Imbalanced Class Distributions; InformationExtraction; REAdaBoost;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5821-9
Type :
conf
DOI :
10.1109/ICFCC.2010.5497739
Filename :
5497739
Link To Document :
بازگشت