• DocumentCode
    3306718
  • Title

    Research on short text classification for web forum

  • Author

    Xiaochun He ; Conghui Zhu ; Tiejun Zhao

  • Author_Institution
    MOE-MS Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., Harbin, China
  • Volume
    2
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1052
  • Lastpage
    1056
  • Abstract
    The unique characteristic of short text makes short text classification quite different from traditional long text processing. The feature space of short text is so sparse, which makes it notoriously difficult to extract sufficient and effective features. In this paper, aiming to classify the short text on web forum accurately, a novel short-text-processing method based on semantic extension is introduced to enhance the content of the original short text, which effectively solves the problem of feature sparse. In addition, we put forward the concept of Key-Pattern (KP) and propose a new text feature representation approach based on KP, which extracts phrase with powerful semantic information as the text features. Traditional classifier model are applied to estimate the text´s classification, experimental results show that the proposed method is effective to improve the accuracy and recall of short text classification.
  • Keywords
    Internet; feature extraction; pattern classification; text analysis; Web forum; classifier model; feature extraction; feature sparse problem; key-pattern concept; long text processing; semantic extension; short text classification; short-text-processing method; text feature representation approach; Classification algorithms; Feature extraction; Internet; Noise measurement; Semantics; Text categorization; Key-Pattern; Semantic extension; Short text classification; Text representation; Web forum;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019652
  • Filename
    6019652