DocumentCode
3102130
Title
Research on Text Hierarchical Topic Identification Algorithm Based on the Dynamic Diverse Thresholds Clustering
Author
Yong-Dong, XU ; Guang-Ri, QUAN ; Zhi-Ming, Xu ; Ya-Dong, WANG
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Weihai, China
fYear
2009
fDate
7-9 Dec. 2009
Firstpage
206
Lastpage
210
Abstract
In many NLP applications, text topic identification is a common problem. Traditional topic identification method always generated a single-layered topic structure which is usually inaccurate topic division even if generated manually by the human experts. This paper proposed a concept of hierarchical topic which used multi-layer topic tree structure to represent the text or text set. Secondly, this paper proposed an iterative text units clustering method to recognize automatically the hierarchical topic of the text set. In this method, text clustering processing paused when each topic in the text set were correctly divided into multiple sub-topics, and such processing continued until a hierarchical topic tree had been built. A difficult problem of this method was how to automatically determine multiple pause threshold values and was resolved by the minimized clustering entropy method in this paper. The results of our experiments demonstrated the effectiveness of the method.
Keywords
entropy; natural language processing; pattern clustering; text analysis; tree data structures; dynamic diverse thresholds clustering; iterative text units clustering; minimized clustering entropy method; multi-layer topic tree structure; single-layered topic structure; text hierarchical topic identification algorithm; Application software; Clustering algorithms; Clustering methods; Computer science; Entropy; Humans; Iterative methods; Layout; Text recognition; Tree data structures; Hierarchical topic; Multi-threshold identification; Text clustering; Text topic identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location
Singapore
Print_ISBN
978-0-7695-3904-1
Type
conf
DOI
10.1109/IALP.2009.50
Filename
5380766
Link To Document