مرکز منطقه ای اطلاع رساني علوم و فناوري - Healing Sample Selection Bias by Source Classifier Selection

DocumentCode :

3125045

Title :

Healing Sample Selection Bias by Source Classifier Selection

Author :

Seah, Chun-Wei ; Tsang, Ivor Wai-Hung ; Ong, Yew-Soon

Author_Institution :

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore

fYear :

2011

fDate :

11-14 Dec. 2011

Firstpage :

577

Lastpage :

586

Abstract :

Domain Adaptation (DA) methods are usually carried out by means of simply reducing the marginal distribution differences between the source and target domains, and subsequently using the resultant trained classifier, namely source classifier, for use in the target domain. However, in many cases, the true predictive distributions of the source and target domains can be vastly different especially when their class distributions are skewed, causing the issues of sample selection bias in DA. Hence, DA methods which leverage the source labeled data may suffer from poor generalization in the target domain, resulting in negative transfer. In addition, we observed that many DA methods use either a source classifier or a linear combination of source classifiers with a fixed weighting for predicting the target unlabeled data. Essentially, the labels of the target unlabeled data are spanned by the prediction of these source classifiers. Motivated by these observations, in this paper, we propose to construct many source classifiers of diverse biases and learn the weight for each source classifier by directly minimizing the structural risk defined on the target unlabeled data so as to heal the possible sample selection bias. Since the weights are learned by maximizing the margin of separation between opposite classes on the target unlabeled data, the proposed method is established here as Maximal Margin Target Label Learning (MMTLL), which is in a form of Multiple Kernel Learning problem with many label kernels. Extensive experimental studies of MMTLL against several state-of-the-art methods on the Sentiment and Newsgroups datasets with various imbalanced class settings showed that MMTLL exhibited robust accuracies on all the settings considered and was resilient to negative transfer, in contrast to other counterpart methods which suffered significantly in prediction accuracy.

Keywords :

learning (artificial intelligence); pattern classification; domain adaptation methods; marginal distribution differences; maximal margin target label learning; negative transfer; newsgroups datasets; predictive distributions; sample selection bias; sentiment datasets; source classifier selection; structural risk minimization; target unlabeled data; trained classifier; Accuracy; Complexity theory; Joints; Kernel; Machine learning; Support vector machines; Vectors; Classifier Selection; Domain Adaptation; Maximum Margin Separation; Multiple Kernel Learning; Negative Transfer; Sample Selection Bias;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2011 IEEE 11th International Conference on

Conference_Location :

Vancouver,BC

ISSN :

1550-4786

Print_ISBN :

978-1-4577-2075-8

Type :

conf

DOI :

10.1109/ICDM.2011.73

Filename :

6137262

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3125045