• DocumentCode
    534198
  • Title

    Clustering Efficient Method on Mass Chinese Text Based on Semantic Concept

  • Author

    Jinling, Liu ; Hong, Zhou

  • Author_Institution
    Comput. Eng. Fac., Huaiyin Inst. of Technol., Huaian, China
  • Volume
    2
  • fYear
    2010
  • fDate
    16-18 July 2010
  • Firstpage
    151
  • Lastpage
    155
  • Abstract
    In the current thinking of the Chinese text clustering, most clustering algorithms are limited by the data´s scalability and the results´ interpretability. This paper presents an efficient Chinese text clustering method based on semantic concepts. This method, proceeding from the text itself, by using classified hierarchy Subject Word in Thesaurus of Modern Chinese, extracts the conceptional tuple from a high-dimensional text vector collection to form the high-level concept expressing clustering results. Then samples are divided based on these high-level concepts which indicates the entire text clustering process has completed. On the premise of ensuring the clustering results´ accuracy, this method can greatly reduce the number of data needing to be processed and improve the clustering algorithms´ scalability. The experimental results show that this clustering algorithm has achieved a satisfactory clustering result and a higher implementation efficiency as well.
  • Keywords
    natural language processing; pattern clustering; text analysis; Chinese text clustering; classified hierarchy subject word; clustering efficient method; high-dimensional text vector collection; mass Chinese text; modern Chinese thesaurus; semantic concept; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Clustering methods; Dictionaries; Semantics; Thesauri; chinese text; classified dictionary; clustering; conceptional tuple; semantic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Applications (IFITA), 2010 International Forum on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-7621-3
  • Electronic_ISBN
    978-1-4244-7622-0
  • Type

    conf

  • DOI
    10.1109/IFITA.2010.77
  • Filename
    5634880