Purifying training data to improve performance of multi-label classification algorithms

Author

Kanj, Sawsan ; Abdallah, Fahed ; Denoux, Thierry

Author_Institution

HEUDIASYC, Univ. de Technol. de Compiegne, Compiègne, France

fYear

2012

fDate

9-12 July 2012

Firstpage

1784

Lastpage

1791

Abstract

Multi-label classification assumes that each object in the training set is associated with a set of labels, and the goal is to assign labels to unseen instances. k-nearest neighbors based algorithms answer the multi-label problem by using inherent information given by the neighbors of the observation to classify. Due to several problems, like errors in the input vectors, or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for editing out some training instances by voting of some metrics in order to purify the existing training sample. This purifying approach is adapted on the recently proposed evidential k-nearest neighbors for multi-label classification. Comparative experimental results on various data sets demonstrate the usefulness and effectiveness of our approach.

Keywords

data analysis; learning (artificial intelligence); pattern classification; evidential k-nearest neighbors; input vectors; label assignment; machine learning; multilabel classification algorithm; training data purification; training instance; Classification algorithms; Loss measurement; Noise measurement; Training; Training data; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Fusion (FUSION), 2012 15th International Conference on

Conference_Location

Singapore

Print_ISBN

978-1-4673-0417-7

Electronic_ISBN

978-0-9824438-4-2

Type

conf

Filename

6290519