مرکز منطقه ای اطلاع رساني علوم و فناوري - Useful attributes identification for Unsupervised Information Extraction result set based on REAdaBoost Naïve Bayes

DocumentCode :

519722

Title :

Useful attributes identification for Unsupervised Information Extraction result set based on REAdaBoost Naïve Bayes

Author :

Yin, Wenke ; Zhu, Ming

Author_Institution :

Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China

Volume :

fYear :

2010

fDate :

21-24 May 2010

Abstract :

Unsupervised Information Extraction has attracted great attentions in the literature. However, it is inevitable to include useless noise in the result set. Besides, the proportion of useful attributes and the noise in the result set is greatly imbalanced, and the importance of these two types of data is also different. So how to effectively identify the useful attributes becomes an open question. To address this problem, this paper proposes a revised AdaBoost algorithm-REAdaBoost. The weight coefficient of REAdaBoost is not only decided by the precision of useful attributes, but also correlates with the recall for rare attributes. We use Naïve Bayes as the base classifier, and then apply AdaBoost and REAdaBoost to boost it separately. The experiment results show that on the premise of not increasing the overall error rate, REAdaBoost has better performance than AdaBoost and Naïve Bayes in predicting both the useful attributes and the rare attributes.

Keywords :

Bayes methods; data mining; pattern classification; AdaBoost algorithm; REAdaBoost naive Bayes; attributes identification; unsupervised information extraction; weight coefficient; 1f noise; Automation; Background noise; Data mining; Error analysis; Explosives; Internet; Large-scale systems; Web pages; Web sites; Classification; Imbalanced Class Distributions; InformationExtraction; REAdaBoost;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Future Computer and Communication (ICFCC), 2010 2nd International Conference on

Conference_Location :

Wuhan

Print_ISBN :

978-1-4244-5821-9

Type :

conf

DOI :

10.1109/ICFCC.2010.5497739

Filename :

5497739

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=519722