DocumentCode :
590946
Title :
Improvement in automatic classification of Persian documents by means of Naïve Bayes and Representative Vector
Author :
Jafari, Aghil ; Hosseinejad, M. ; Amiri, Ali
Author_Institution :
Islamic Azad Univ. of Zanjan, Zanjan, Iran
fYear :
2011
fDate :
13-14 Oct. 2011
Firstpage :
226
Lastpage :
229
Abstract :
Representative Vector is a kind of Vector which includes related words and the degree of their relationships. In this paper the effect of using this kind of Vector on automatic classification of Persian documents is examined. In this method, preprocessed documents, extra words as well as word stems are at first found. Next, through one of the known ways, some features are extracted for each category. Then, the Representative Vector, which is made based on the elicited features, leads to some more detailed words which are better Representatives for each category. Findings of the experiments show that Precision and Recall can be increased significantly by extra words omission and addition of few words in the Representative Vectors as well as the use of a famous classification model like Naïve Bayes.
Keywords :
Bayes methods; classification; document handling; Naive Bayes; Persian documents; automatic classification model; feature extraction; representative vector; Computers; Educational institutions; Information retrieval; Semantics; Support vector machine classification; Text categorization; Vectors; Documents Classification; Naïve Bayes Classifier; Representative Vector; Stemming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2011 1st International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4673-5712-8
Type :
conf
DOI :
10.1109/ICCKE.2011.6413355
Filename :
6413355
Link To Document :
بازگشت