Experimental analysis of new algorithms for learning ternary classifiers

Author

Zucker, Jean-Daniel ; Chevaleyre, Yann ; Van Sang, Dao

Author_Institution

IRD France Nord, UMMISCO, Bondy, France

fYear

2015

fDate

25-28 Jan. 2015

Firstpage

19

Lastpage

24

Abstract

Discrete linear classifier is a very sparse class of decision model that has proved useful to reduce overfitting in very high dimension learning problems. However, learning discrete linear classifier is known as a difficult problem. It requires finding a discrete linear model minimizing the classification error over a given sample. A ternary classifier is a classifier defined by a pair (w, r) where w is a vector in {-1, 0, +1}ⁿ and r is a nonnegative real capturing the threshold or offset. The goal of the learning algorithm is to find a vector of weights in {-1, 0, +1}ⁿ that minimizes the hinge loss of the linear model from the training data. This problem is NP-hard and one approach consists in exactly solving the relaxed continuous problem and to heuristically derive discrete solutions. A recent paper by the authors has introduced a randomized rounding algorithm [1] and we propose in this paper more sophisticated algorithms that improve the generalization error. These algorithms are presented and their performances are experimentally analyzed. Our results show that this kind of compact model can address the complex problem of learning predictors from bioinformatics data such as metagenomics ones where the size of samples is much smaller than the number of attributes. The new algorithms presented improve the state of the art algorithm to learn ternary classifier. The source of power of this improvement is done at the expense of time complexity.

Keywords

bioinformatics; computational complexity; generalisation (artificial intelligence); learning (artificial intelligence); pattern classification; vectors; NP-hard; bioinformatics data; classification error minimization; decision model; discrete linear classifier learning; generalization error; metagenomics; randomized rounding algorithm; ternary classifier learning; time complexity; vector; Algorithm design and analysis; Classification algorithms; Data models; Error analysis; Fasteners; Prediction algorithms; Vectors; Metagenomics data; Randomized Rounding; Ternary Classifier;

fLanguage

English

Publisher

ieee

Conference_Titel

Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on

Conference_Location

Can Tho

Print_ISBN

978-1-4799-8043-7

Type

conf

DOI

10.1109/RIVF.2015.7049868

Filename

7049868