Title :
Internet news headlines classification method based on the N-Gram language model
Author :
Liu, Xin ; Rujia, Gao ; Liufu, Song
Author_Institution :
Comput. Sci. & Eng., Changchun Univ. of Technol., Changchun, China
Abstract :
This paper aiming at the Internet news headlines short text classification. After analysis of the traditional classification model and the characteristics of Internet news headlines, this paper presents a classification model of the N-Gram language model as the Internet news headlines. Internet news headlines classification process is divided into three modules, the preprocessing module, the training module and the prediction module. Designing a classification algorithm based on N-Gram language model Internet news headlines. The algorithm classify the Internet news headlines by calculating the probability value of the words string of unclassified and category C, while calculating the probability value it can also take into account the relevance of the previous term. So it has better classification performance.
Keywords :
Internet; computational linguistics; information resources; pattern classification; probability; text analysis; Internet news headlines text classification method; N-Gram language model; probability value; Editorials; Frequency statistics; Internet news headlines; N-Gram language model; short text classification;
Conference_Titel :
Computer Science and Information Processing (CSIP), 2012 International Conference on
Conference_Location :
Xi´an, Shaanxi
Print_ISBN :
978-1-4673-1410-7
DOI :
10.1109/CSIP.2012.6308980