DocumentCode :
1971310
Title :
Document classification efficiency of phrase-based techniques
Author :
Kapalavayi, Nagesh ; Murthy, S. N Jayaram ; Hu, Gongzhu
Author_Institution :
Dept. of Comput. Sci., Central Michigan Univ., Mount Pleasant, MI
fYear :
2009
fDate :
10-13 May 2009
Firstpage :
174
Lastpage :
178
Abstract :
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
Keywords :
classification; statistical analysis; text analysis; document classification; keyword based feature; phrase based technique; statistical dataset; text document; textual content; Computer science; Data engineering; Data mining; Databases; Information retrieval; Natural language processing; Programming profession; Statistics; Synthetic aperture sonar; Text mining; document classication; keyword-based and phrase-based features; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
Conference_Location :
Rabat
Print_ISBN :
978-1-4244-3807-5
Electronic_ISBN :
978-1-4244-3806-8
Type :
conf
DOI :
10.1109/AICCSA.2009.5069321
Filename :
5069321
Link To Document :
بازگشت