DocumentCode
493507
Title
Subsequence-Based Text Segmentation and Labeling
Author
Chen, Xi ; Chen, Shihong
Author_Institution
Comput. Sch., Wuhan Univ., Wuhan
Volume
1
fYear
2009
fDate
7-8 March 2009
Firstpage
582
Lastpage
587
Abstract
Text segmentation is important for many natural language processing tasks, such as passage retrieval and summarization. This paper uses suffix tree model for the text representation and introduces a new measure, subsequence-based coherence, to represent the coherence between sentences and utilize the word order information. This paper also introduces a text segmentation algorithm, subsequence-based maximum cut, and a passage labeling approach based on subsequences. The educational text segmentation results show that our method outperforms some of the existing methods, and the passage labeling result is approving.
Keywords
natural language processing; text analysis; trees (mathematics); natural language processing task; passage labeling approach; subsequence-based coherence; subsequence-based maximum cut; subsequence-based text segmentation; suffix tree model; text labeling; text representation; word order information; Books; Coherence; Computer science; Computer science education; Educational technology; Information retrieval; Intelligent systems; Labeling; Natural language processing; Supervised learning; maximum cut; passage labeling; sentence coherence; subsequence; text segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
Conference_Location
Wuhan, Hubei
Print_ISBN
978-1-4244-3581-4
Type
conf
DOI
10.1109/ETCS.2009.138
Filename
4958841
Link To Document