DocumentCode :
476079
Title :
A hybrid approach for web information extraction
Author :
Xiao, Ji-yi ; Zhu, Dao-hui ; Zou, La-mei
Author_Institution :
Sch. of Comput. Sci. & Technol., South China Univ., Hengyang
Volume :
3
fYear :
2008
fDate :
12-15 July 2008
Firstpage :
1560
Lastpage :
1563
Abstract :
This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.
Keywords :
Internet; hidden Markov models; information retrieval; knowledge acquisition; Web information extraction; hidden Markov model; maximum entropy method; Computer science; Cybernetics; Data mining; Electronic mail; Entropy; Hidden Markov models; Iterative algorithms; Machine learning; Probability distribution; Training data; Generalized iterative scaling; Hidden Markov model; Information extraction; Maximum entropy; Maximum entropy Markov model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
Type :
conf
DOI :
10.1109/ICMLC.2008.4620654
Filename :
4620654
Link To Document :
بازگشت