• DocumentCode
    3417245
  • Title

    Internet news headlines classification method based on the N-Gram language model

  • Author

    Liu, Xin ; Rujia, Gao ; Liufu, Song

  • Author_Institution
    Comput. Sci. & Eng., Changchun Univ. of Technol., Changchun, China
  • fYear
    2012
  • fDate
    24-26 Aug. 2012
  • Firstpage
    826
  • Lastpage
    828
  • Abstract
    This paper aiming at the Internet news headlines short text classification. After analysis of the traditional classification model and the characteristics of Internet news headlines, this paper presents a classification model of the N-Gram language model as the Internet news headlines. Internet news headlines classification process is divided into three modules, the preprocessing module, the training module and the prediction module. Designing a classification algorithm based on N-Gram language model Internet news headlines. The algorithm classify the Internet news headlines by calculating the probability value of the words string of unclassified and category C, while calculating the probability value it can also take into account the relevance of the previous term. So it has better classification performance.
  • Keywords
    Internet; computational linguistics; information resources; pattern classification; probability; text analysis; Internet news headlines text classification method; N-Gram language model; probability value; Editorials; Frequency statistics; Internet news headlines; N-Gram language model; short text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Processing (CSIP), 2012 International Conference on
  • Conference_Location
    Xi´an, Shaanxi
  • Print_ISBN
    978-1-4673-1410-7
  • Type

    conf

  • DOI
    10.1109/CSIP.2012.6308980
  • Filename
    6308980