Title :
Research of Information Extraction Algorithm based on Hidden Markov Model
Author :
Zhou, CaiLan ; Li, Shasha
Author_Institution :
Computer & Science Technology Department, Wuhan University of Technology, Hubei, China
Abstract :
Based on the research of Web Information Extraction Algorithm of Hidden Markov Model, this paper focus on the application of HMM in text information extraction, and improved methods of information extraction with constructing granularity refined DOM tree combined with regular expression to extract detailed information points. At the same time, we smooth the probability of unknown observations. Test result show that, the improved HMM has better extraction performance.
Keywords :
Algorithm design and analysis; Data mining; Data models; HTML; Hidden Markov models; Training; Web pages; DOM Tree; Hidden Markov Model; Information Extraction;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5690348