DocumentCode
1724106
Title
Identifying problematic classes in text classification
Author
Roberts, Paul J. ; Howroyd, John ; Mitchell, Richard ; Ruiz, Virginie
Author_Institution
Univ. of Reading, Reading, UK
fYear
2010
Firstpage
1
Lastpage
6
Abstract
Real-world text classification tasks often suffer from poor class structure with many overlapping classes and blurred boundaries. Training data pooled from multiple sources tend to be inconsistent and contain erroneous labelling, leading to poor performance of standard text classifiers. The classification of health service products to specialized procurement classes is used to examine and quantify the extent of these problems. A novel method is presented to analyze the labelled data by selectively merging classes where there is not enough information for the classifier to distinguish them. Initial results show the method can identify the most problematic classes, which can be used either as a focus to improve the training data or to merge classes to increase confidence in the predicted results of the classifier.
Keywords
pattern classification; procurement; text analysis; blurred boundary; class merging; class structure; health service product; problematic classes identification; procurement class; text classification; text classifier; training data; Companies; Mutual information; Noise; Robustness; Systematics; Training; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Cybernetic Intelligent Systems (CIS), 2010 IEEE 9th International Conference on
Conference_Location
Reading
Print_ISBN
978-1-4244-9023-3
Electronic_ISBN
978-1-4244-9024-0
Type
conf
DOI
10.1109/UKRICIS.2010.5898142
Filename
5898142
Link To Document