DocumentCode
1809595
Title
Information Extraction incorporating Paragraph Feature and Hidden Markov Model
Author
Na, Liu ; Mingyu, Lu ; Huanling, Tang
fYear
2007
fDate
18-21 Sept. 2007
Firstpage
953
Lastpage
956
Abstract
With the data of Internet continuous growth, information extraction has become the foundational and effective means to handling the quantity of text. This paper puts forward a method of information extraction that incorporating paragraph feature and hidden Markov model. The method takes paragraph instead of words as research object, paragraph is text sequence saved from web pages after preprocessed. Every paragraph is converted into special tokens, these tokens are the observation symbols of hidden Markov model. The whole experiments are carried out on EBM Web pages set. The information extracted includes title, author, affiliation and journal etc. The experimental results show that this method can improve precision and recall in some degree.
Keywords
feature extraction; hidden Markov models; feature extraction; hidden Markov model; information extraction; paragraph feature; Automata; Computer science; Data mining; Feature extraction; Filling; Hidden Markov models; IP networks; Parallel processing; Spatial databases; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location
Liaoning
Print_ISBN
978-0-7695-2943-1
Type
conf
DOI
10.1109/NPC.2007.109
Filename
4351609
Link To Document