Title :
Bangla news classification using naive Bayes classifier
Author :
Chy, Abu Nowshed ; Seddiqui, Md Hanif ; Das, S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh
Abstract :
Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user´s need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system.
Keywords :
Bayes methods; Internet; Web sites; information retrieval; pattern classification; text analysis; Bangla Stemmer; Bangla news article content classification; Full-Text-RSS; HTML pages; IPTC news code; Web crawler; naive Bayes classifier; news grouping; news site; useful text extraction; user preference; Computers; Dictionaries; Information technology; Layout; Taxonomy; Training; Vectors;
Conference_Titel :
Computer and Information Technology (ICCIT), 2013 16th International Conference on
Conference_Location :
Khulna
DOI :
10.1109/ICCITechn.2014.6997369