Title :
Automatic text categorization: Marathi documents
Author :
Jaydeep Jalindar Patil;Nagaraju Bogiri
Author_Institution :
Department of Computer Engineering (Computer Networks), K. J. College of Engineering & Management, Research Pune, India
Abstract :
Information technology generated huge data on the internet. Initially this data is mainly in English language so majority of data mining research work is on the English text documents. As the internet usage increased, data in other languages like Marathi, Tamil, Telugu and Punjabi etc. increased on the internet. This paper presents the retrieval system for Marathi language documents based on the user profile. User profile considers the user´s interests, user´s browsing history. The system shows the Marathi documents to the end user based on the user profile. Automatic text categorization is useful in better management and retrieval of these text documents and also makes document retrieval as simple task. This paper discusses the automatic text categorization of Marathi documents and literature survey of the related work done in automatic text categorization of Marathi documents. Various learning techniques exist for the classification of text documents like Naïve Bayes, Support Vector Machine and Decision Trees etc. There are different clustering techniques used for text categorization like Label Induction Grouping Algorithm, Suffix Tree Clustering, and K- means etc. Literature survey shows that for non-English documents VSM [Vector Space Model] gives the better results than any other models. The system provides text categorization of Marathi documents by using the LINGO [Label Induction Grouping] algorithm. LINGO is based on the VSM [Vector Space Model]. The system uses the dataset which contains 200 documents of 20 different categories. The result represents that for Marathi text documents LINGO clustering algorithm is efficient.
Keywords :
"Clustering algorithms","Matrix decomposition","Text categorization","Internet","Algorithm design and analysis","Classification algorithms","Search engines"
Conference_Titel :
Energy Systems and Applications, 2015 International Conference on
DOI :
10.1109/ICESA.2015.7503438