Title :
Web object information extraction based on generalized hidden Markov model
Author :
Wang, Jing ; Yao, Yong ; Liu, Zhijing
Author_Institution :
Xidian Univ., Xi´´an
Abstract :
Due to the differences between Web page and plain text document, the concept of Web object is introduced in this paper. Besides, the supposed state transition and the emission symbol conditions are improved based on generalized hidden Markov model (GHMM), and a novel web objects information extraction method is proposed. Finally, through an example, it shows that the proposed method has a very high precision for Web objects information extraction.
Keywords :
Internet; hidden Markov models; information retrieval; Web object information extraction; Web page; emission symbol condition; generalized hidden Markov model; plain text document; state transition; Data mining; Hidden Markov models; Information technology; Generalized Hidden Markov Model (GHMM); Hidden Markov Model; Information Extraction; Web Object;
Conference_Titel :
Communications and Information Technologies, 2007. ISCIT '07. International Symposium on
Conference_Location :
Sydney,. NSW
Print_ISBN :
978-1-4244-0976-1
Electronic_ISBN :
978-1-4244-0977-8
DOI :
10.1109/ISCIT.2007.4392257