• DocumentCode
    3317683
  • Title

    Application of the Character-Level Statistical Method in Text Categorization

  • Author

    Yang, Zhen ; Nie, Xiangfei ; Xu, Weiran ; Guo, Jun

  • Author_Institution
    Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
  • Volume
    2
  • fYear
    2006
  • fDate
    3-6 Nov. 2006
  • Firstpage
    1412
  • Lastpage
    1417
  • Abstract
    It is generally thought that semantic and grammatical information was very significant to better understanding and processing of text. But in simple text categorization task, absence of this information does not always lead to the degradation of classifier performance. In this paper, we discuss the application of the character-level statistical method in text categorization, which extract character-level frequent pattern rather than consider the semantic and grammatical information. Compared with traditional n-gram model, the presented method is easy and convenient. Then by casting character-level statistical method in Bayesian theory framework, the proposed method was applied to spam detection. At last, we discuss the multiclass problem in short message categorization based on combination strategies. Effectiveness of the models and feasibility of the present method are verified
  • Keywords
    Bayes methods; natural language processing; pattern recognition; statistical analysis; text analysis; Bayesian theory; character-level frequent pattern extraction; character-level statistical method; grammatical information; semantic information; short message categorization; spam detection; text categorization; Bayesian methods; Casting; Data mining; Degradation; Feature extraction; Information processing; Natural languages; Statistical analysis; Text categorization; Text processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security, 2006 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    1-4244-0605-6
  • Electronic_ISBN
    1-4244-0605-6
  • Type

    conf

  • DOI
    10.1109/ICCIAS.2006.295293
  • Filename
    4076199