Title :
Feature Selection for Document Type Classification
Author :
Taghva, Kazem ; Vergara, Jason
Author_Institution :
Univ. of Nevada, Las Vegas
Abstract :
In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.
Keywords :
Bayes methods; character sets; classification; document handling; capitalization; document type classification; feature selection; font; k-dependence Bayesian categorization engine; Bayesian methods; Computer networks; Data mining; Engines; Information science; Information technology; Mutual information; Optical character recognition software; Text categorization; Training data; OCR; document classification; document type; text categorization;
Conference_Titel :
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-3099-0
DOI :
10.1109/ITNG.2008.25