Title :
A hybrid approach for web information extraction
Author :
Xiao, Ji-yi ; Zhu, Dao-hui ; Zou, La-mei
Author_Institution :
Sch. of Comput. Sci. & Technol., South China Univ., Hengyang
Abstract :
This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.
Keywords :
Internet; hidden Markov models; information retrieval; knowledge acquisition; Web information extraction; hidden Markov model; maximum entropy method; Computer science; Cybernetics; Data mining; Electronic mail; Entropy; Hidden Markov models; Iterative algorithms; Machine learning; Probability distribution; Training data; Generalized iterative scaling; Hidden Markov model; Information extraction; Maximum entropy; Maximum entropy Markov model;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620654