• DocumentCode
    605948
  • Title

    Optimization for Vietnamese text classification problem by reducing features set

  • Author

    Ha Nguyen Thi Thu ; Quynh Nguyen Huu ; Khanh Nguyen Thi Hong ; Hung Le Manh

  • Author_Institution
    Dept. of Comput. Sci., Vietnam Electr. Power Univ., Hanoi, Vietnam
  • fYear
    2012
  • fDate
    23-25 Oct. 2012
  • Firstpage
    209
  • Lastpage
    212
  • Abstract
    Vietnamese is the single syllable language, so that process of word segmentation is relatively complex, if split word based on whitespaces, it is not accuracy, on the other hand Vietnamese segmentation tools are not high effective. In this paper, we propose a new method that used only topic word for calculating to increase accuracy of the Vietnameses text classification system and optimize the process of calculating. The experimental results show that our method more effective than the proposed approach, higher accuracy and reduce the computational complexity.
  • Keywords
    classification; computational complexity; natural language processing; optimisation; text analysis; word processing; Vietnamese segmentation tools; Vietnamese text classification problem; Vietnamese text classification system accuracy calculation; calculation process optimization; computational complexity reduction; feature set reduction; split word; topic word; whitespaces; word segmentation; Vietnamese text classification; feature set reduction; syllable language; topic word;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Service Science and Data Mining (ISSDM), 2012 6th International Conference on New Trends in
  • Conference_Location
    Taipei
  • Print_ISBN
    978-1-4673-0876-2
  • Type

    conf

  • Filename
    6528628