Information Extraction incorporating Paragraph Feature and Hidden Markov Model

Author

Na, Liu ; Mingyu, Lu ; Huanling, Tang

fYear

2007

fDate

18-21 Sept. 2007

Firstpage

953

Lastpage

956

Abstract

With the data of Internet continuous growth, information extraction has become the foundational and effective means to handling the quantity of text. This paper puts forward a method of information extraction that incorporating paragraph feature and hidden Markov model. The method takes paragraph instead of words as research object, paragraph is text sequence saved from web pages after preprocessed. Every paragraph is converted into special tokens, these tokens are the observation symbols of hidden Markov model. The whole experiments are carried out on EBM Web pages set. The information extracted includes title, author, affiliation and journal etc. The experimental results show that this method can improve precision and recall in some degree.

Keywords

feature extraction; hidden Markov models; feature extraction; hidden Markov model; information extraction; paragraph feature; Automata; Computer science; Data mining; Feature extraction; Filling; Hidden Markov models; IP networks; Parallel processing; Spatial databases; Web pages;

fLanguage

English

Publisher

ieee

Conference_Titel

Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on

Conference_Location

Liaoning

Print_ISBN

978-0-7695-2943-1

Type

conf

DOI

10.1109/NPC.2007.109

Filename

4351609

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1809595