DocumentCode :
3385034
Title :
XML Structure Extraction from Plain Texts with Hidden Markov Model
Author :
Piao Yong ; Zou Sha-sha ; Wang Xiu-Kun
Author_Institution :
EI / Software Sch., Dalian Univ. of Technol., Dalian, China
Volume :
1
fYear :
2010
fDate :
23-24 Oct. 2010
Firstpage :
560
Lastpage :
564
Abstract :
Information extraction is one of the ways to convert unstructured text into structured records. Most of the previous work in this field are devoted to add semantic tags to specific textual content, so their structures are often plain which cannot illustrate relationships among semantic features. A novel approach, Structure Information Extraction System based on Hidden Markov Model (SIEHMM), for the task of extracting structure from plain texts is proposed in these papers, which utilizes path information for HMM training and automatically generate XML. Experiments on a real life dataset show SIEHMM has a high precision and recall ratio and can not only help solve problems of structural storage and text information retrieval, but also take advantages of XML to meet the future trends.
Keywords :
XML; hidden Markov models; text analysis; SIEHMM; XML structure extraction; plain texts; semantic tags; structure information extraction system based on hidden Markov model; structured records; textual content; unstructured text; Data mining; Hidden Markov models; Semantics; Training; Training data; Viterbi algorithm; XML; Hidden Markov Model; Structure Information Extraction; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence and Computational Intelligence (AICI), 2010 International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-8432-4
Type :
conf
DOI :
10.1109/AICI.2010.123
Filename :
5654780
Link To Document :
بازگشت