Title :
A decomposition scheme based on error-correcting output codes for ensembles of text categorizers
Author :
Adeva, Juan José Garcia ; Calvo, Rafael A.
Author_Institution :
Sch. of Electr. & Inf. Eng., Sydney Univ., NSW, Australia
Abstract :
Error-correcting output codes (ECOC) are commonly used to decompose a multicategory problem into many dichotomies. Therefore, the text categorisation task is performed by an ensemble of binary classifiers instead of a single monolithic classifier. The ensemble performance largely depends on the characteristics of the decomposition. We propose a decomposition approach where both the categories and the classifiers are well separated in order to maximise the decision boundaries and minimise correlated predictions. We apply this design to the El Mundo corpus (newspaper articles in Spanish) and the well-known ModApte split of the Reuters-21578 corpus. The results using ensembles are favourably compared to those using a monolithic classifier.
Keywords :
character recognition; error correction codes; text analysis; word processing; ECOC; El Mundo corpus; ModApte split; Reuters-21578 corpus; Spanish newspaper article; binary classifier ensemble; correlated prediction minimization; decision boundary maximization; decomposition approach; decomposition characteristics; decomposition scheme; dichotomy; error-correcting output codes; multicategory problem decomposition; single monolithic classifier; text categorisation; Australia; Automatic testing; Content management; Costs; Information technology; Internet; Machine learning; Machine learning algorithms; Management information systems; Text categorization;
Conference_Titel :
Information Technology and Applications, 2005. ICITA 2005. Third International Conference on
Print_ISBN :
0-7695-2316-1
DOI :
10.1109/ICITA.2005.9