Title :
Rule based approach for text segmentation on Indonesian news article using named entity distribution
Author :
Saniati ; Purwarianti, Ayu
Author_Institution :
Sch. of Electron. & Informatic Eng., Bandung Inst. of Technol., Bandung, Indonesia
Abstract :
Finding good paragraph structure or text segmentation is important in computational linguistic research mainly in areas such as information retrieval, question answering and summarization. We proposed text segmentation by subtopic movement detection based on lexical and named entity distribution. Main contributions of this research are the usage of named entity (with reference resolution) and voting method in measuring text segment similarity, and also redefining rules on the text boundary identification for Indonesian news article. The experiments were done on 52 articles from Indonesian online news. The experimental results achieved 76,8%-79,55% accuracy compared to 60,37%-60,83% on the baseline of other research.
Keywords :
computational linguistics; knowledge based systems; text analysis; Indonesian news article; computational linguistic research; named entity distribution; paragraph structure; rule based approach; text segmentation; Accuracy; Computational linguistics; Equations; Mathematical model; Mutual information; Vocabulary; Indonesian news article; named entity; text segmentation;
Conference_Titel :
Data and Software Engineering (ICODSE), 2014 International Conference on
Print_ISBN :
978-1-4799-8175-5
DOI :
10.1109/ICODSE.2014.7062668