Title :
Automatic text categorization using a system of high-precision and high-recall models
Author :
Dai Li ; Murphey, Yi L.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Michigan, Dearborn, MI, USA
Abstract :
This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.
Keywords :
data mining; learning (artificial intelligence); pattern clustering; text analysis; HPHR; Reuters corpus; automatic text document categorization system; clustering algorithm; high-precision and high-recall models; machine learning algorithms; text mining; vehicle fault diagnostic documents; Algorithm design and analysis; Clustering algorithms; Machine learning algorithms; Text categorization; Training; Training data; Vectors;
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/CIDM.2014.7008692