DocumentCode :
2728981
Title :
Baum-Welch Style EM Approach on Simple Bayesian Models forWeb Data Annotation
Author :
Gelgi, Fatih ; Davulcu, Hasan
Author_Institution :
Arizona State Univ., Tempe
fYear :
2007
fDate :
2-5 Nov. 2007
Firstpage :
736
Lastpage :
742
Abstract :
In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.
Keywords :
Bayes methods; Web sites; document handling; expectation-maximisation algorithm; inference mechanisms; information retrieval; parameter estimation; statistical analysis; Baum-Welch style EM algorithm; Bayesian classifier; Bayesian model; Web data annotation; Web document; automated information extraction system; contextual reasoning; parameter estimation; statistical model; weakly annotated data; Bayesian methods; Computer science; Context modeling; Data engineering; Data mining; Digital cameras; Ontologies; Parameter estimation; Statistics; Web pages; Baum-Welch; Bayesian Models.; Expectation-Maximization; Weakly annotated data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, IEEE/WIC/ACM International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3026-0
Type :
conf
DOI :
10.1109/WI.2007.12
Filename :
4427182
Link To Document :
بازگشت