DocumentCode :
2465012
Title :
Web information extraction based on hidden Markov model
Author :
Lai, Jianbing ; Liu, Qiang ; Liu, Yi
Author_Institution :
School of Software, Tsinghua University, Beijing, China
fYear :
2010
fDate :
14-16 April 2010
Firstpage :
234
Lastpage :
238
Abstract :
This paper proposes a semantic-block-based hidden Markov model. Semantic block is segmented from the elicited information of various websites based on their characteristic of semi-structure. The model adopts semantic block as the basic element in an observation sequence, replacing the original element — word, in order to improve the accuracy and efficiency of the transition matrix. Also, it optimizes the observation probability distribution and the estimation accuracy of state transition sequence by adopting the “voting strategy” and modifying Viterbi algorithm. In the end, the experiment results are able to show that the new model and algorithms give satisfying performance in recall and precision for web information extraction.
Keywords :
Algorithm design and analysis; Collaborative work; Data mining; Dictionaries; Hidden Markov models; Internet; Probability distribution; State estimation; Viterbi algorithm; Voting; hidden Markov model; semantic block; semi-structure; voting strategy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Supported Cooperative Work in Design (CSCWD), 2010 14th International Conference on
Conference_Location :
Shanghai, China
Print_ISBN :
978-1-4244-6763-1
Type :
conf
DOI :
10.1109/CSCWD.2010.5471969
Filename :
5471969
Link To Document :
بازگشت