DocumentCode :
2262703
Title :
Web object information extraction based on generalized hidden Markov model
Author :
Wang, Jing ; Yao, Yong ; Liu, Zhijing
Author_Institution :
Xidian Univ., Xi´´an
fYear :
2007
fDate :
17-19 Oct. 2007
Firstpage :
1520
Lastpage :
1523
Abstract :
Due to the differences between Web page and plain text document, the concept of Web object is introduced in this paper. Besides, the supposed state transition and the emission symbol conditions are improved based on generalized hidden Markov model (GHMM), and a novel web objects information extraction method is proposed. Finally, through an example, it shows that the proposed method has a very high precision for Web objects information extraction.
Keywords :
Internet; hidden Markov models; information retrieval; Web object information extraction; Web page; emission symbol condition; generalized hidden Markov model; plain text document; state transition; Data mining; Hidden Markov models; Information technology; Generalized Hidden Markov Model (GHMM); Hidden Markov Model; Information Extraction; Web Object;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications and Information Technologies, 2007. ISCIT '07. International Symposium on
Conference_Location :
Sydney,. NSW
Print_ISBN :
978-1-4244-0976-1
Electronic_ISBN :
978-1-4244-0977-8
Type :
conf
DOI :
10.1109/ISCIT.2007.4392257
Filename :
4392257
Link To Document :
بازگشت