Title :
Document classification efficiency of phrase-based techniques
Author :
Kapalavayi, Nagesh ; Murthy, S. N Jayaram ; Hu, Gongzhu
Author_Institution :
Dept. of Comput. Sci., Central Michigan Univ., Mount Pleasant, MI
Abstract :
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
Keywords :
classification; statistical analysis; text analysis; document classification; keyword based feature; phrase based technique; statistical dataset; text document; textual content; Computer science; Data engineering; Data mining; Databases; Information retrieval; Natural language processing; Programming profession; Statistics; Synthetic aperture sonar; Text mining; document classication; keyword-based and phrase-based features; text mining;
Conference_Titel :
Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
Conference_Location :
Rabat
Print_ISBN :
978-1-4244-3807-5
Electronic_ISBN :
978-1-4244-3806-8
DOI :
10.1109/AICCSA.2009.5069321