DocumentCode :
1782531
Title :
Bangla news classification using naive Bayes classifier
Author :
Chy, Abu Nowshed ; Seddiqui, Md Hanif ; Das, S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh
fYear :
2014
fDate :
8-10 March 2014
Firstpage :
366
Lastpage :
371
Abstract :
Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user´s need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system.
Keywords :
Bayes methods; Internet; Web sites; information retrieval; pattern classification; text analysis; Bangla Stemmer; Bangla news article content classification; Full-Text-RSS; HTML pages; IPTC news code; Web crawler; naive Bayes classifier; news grouping; news site; useful text extraction; user preference; Computers; Dictionaries; Information technology; Layout; Taxonomy; Training; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (ICCIT), 2013 16th International Conference on
Conference_Location :
Khulna
Type :
conf
DOI :
10.1109/ICCITechn.2014.6997369
Filename :
6997369
Link To Document :
بازگشت