• DocumentCode
    3102130
  • Title

    Research on Text Hierarchical Topic Identification Algorithm Based on the Dynamic Diverse Thresholds Clustering

  • Author

    Yong-Dong, XU ; Guang-Ri, QUAN ; Zhi-Ming, Xu ; Ya-Dong, WANG

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Weihai, China
  • fYear
    2009
  • fDate
    7-9 Dec. 2009
  • Firstpage
    206
  • Lastpage
    210
  • Abstract
    In many NLP applications, text topic identification is a common problem. Traditional topic identification method always generated a single-layered topic structure which is usually inaccurate topic division even if generated manually by the human experts. This paper proposed a concept of hierarchical topic which used multi-layer topic tree structure to represent the text or text set. Secondly, this paper proposed an iterative text units clustering method to recognize automatically the hierarchical topic of the text set. In this method, text clustering processing paused when each topic in the text set were correctly divided into multiple sub-topics, and such processing continued until a hierarchical topic tree had been built. A difficult problem of this method was how to automatically determine multiple pause threshold values and was resolved by the minimized clustering entropy method in this paper. The results of our experiments demonstrated the effectiveness of the method.
  • Keywords
    entropy; natural language processing; pattern clustering; text analysis; tree data structures; dynamic diverse thresholds clustering; iterative text units clustering; minimized clustering entropy method; multi-layer topic tree structure; single-layered topic structure; text hierarchical topic identification algorithm; Application software; Clustering algorithms; Clustering methods; Computer science; Entropy; Humans; Iterative methods; Layout; Text recognition; Tree data structures; Hierarchical topic; Multi-threshold identification; Text clustering; Text topic identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing, 2009. IALP '09. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3904-1
  • Type

    conf

  • DOI
    10.1109/IALP.2009.50
  • Filename
    5380766