Title :
Improvement of the dotplotting method for linear text segmentation
Author :
Ye, Na ; Zhu, Jingbo ; Luo, Haitao ; Wang, Huizhen ; Zhang, Bin
Author_Institution :
Natural Language Process. Lab., Inst. of Comput. Software & Theor., China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
The dotplotting method, employed by Reynar (1994), is a state-of-the-art algorithm for automatic linear text segmentation. However, several problems are found in its measure for assessing density that represents topical coherence: the density function is asymmetric, leading to the apparent false conclusion that forward scan may result in different segmentation with backward scan; besides, while determining next boundary, the assessing strategy doesn´t adequately take the previously located boundaries into account. In this paper we propose modified models that remedy these problems. We also make use of segment length to improve segmentation performance. Experimental results show that the modified models achieve considerable improvement in Pk value and precision and recall over the original dotplotting method.
Keywords :
text analysis; automatic linear text segmentation; dotplotting method; topical coherence; Coherence; Computer applications; Density functional theory; Density measurement; Electronic mail; Information resources; Laboratories; Natural language processing; Software algorithms; Text processing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598814