DocumentCode
3318270
Title
Improvement of the dotplotting method for linear text segmentation
Author
Ye, Na ; Zhu, Jingbo ; Luo, Haitao ; Wang, Huizhen ; Zhang, Bin
Author_Institution
Natural Language Process. Lab., Inst. of Comput. Software & Theor., China
fYear
2005
fDate
30 Oct.-1 Nov. 2005
Firstpage
636
Lastpage
641
Abstract
The dotplotting method, employed by Reynar (1994), is a state-of-the-art algorithm for automatic linear text segmentation. However, several problems are found in its measure for assessing density that represents topical coherence: the density function is asymmetric, leading to the apparent false conclusion that forward scan may result in different segmentation with backward scan; besides, while determining next boundary, the assessing strategy doesn´t adequately take the previously located boundaries into account. In this paper we propose modified models that remedy these problems. We also make use of segment length to improve segmentation performance. Experimental results show that the modified models achieve considerable improvement in Pk value and precision and recall over the original dotplotting method.
Keywords
text analysis; automatic linear text segmentation; dotplotting method; topical coherence; Coherence; Computer applications; Density functional theory; Density measurement; Electronic mail; Information resources; Laboratories; Natural language processing; Software algorithms; Text processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN
0-7803-9361-9
Type
conf
DOI
10.1109/NLPKE.2005.1598814
Filename
1598814
Link To Document