DocumentCode
1159
Title
Transfer across Completely Different Feature Spaces via Spectral Embedding
Author
Xiaoxiao Shi ; Qi Liu ; Wei Fan ; Yu, Philip S.
Author_Institution
Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
Volume
25
Issue
4
fYear
2013
fDate
Apr-13
Firstpage
906
Lastpage
918
Abstract
In many applications, it is very expensive or time consuming to obtain a lot of labeled examples. One practically important problem is: can the labeled data from other related sources help predict the target task, even if they have 1) different feature spaces (e.g., image versus text data), 2) different data distributions, and 3) different output spaces? This paper proposes a solution and discusses the conditions where this is highly likely to produce better results. It first unifies the feature spaces of the target and source data sets by spectral embedding, even when they are with completely different feature spaces. The principle is to devise an optimization objective that preserves the original structure of the data, while at the same time, maximizes the similarity between the two. A linear projection model, as well as a nonlinear approach are derived on the basis of this principle with closed forms. Second, a judicious sample selection strategy is applied to select only those related source examples. At last, a Bayesian-based approach is applied to model the relationship between different output spaces. The three steps can bridge related heterogeneous sources in order to learn the target task. Among the 20 experiment data sets, for example, the images with wavelet-transformed-based features are used to predict another set of images whose features are constructed from color-histogram space; documents are used to help image classification, etc. By using these extracted examples from heterogeneous sources, the models can reduce the error rate by as much as 50 percent, compared with the methods using only the examples from the target task.
Keywords
Bayes methods; data analysis; document image processing; feature extraction; image classification; nonlinear programming; spectral analysis; wavelet transforms; Bayesian-based approach; color histogram space; data distribution; document processing; feature space; heterogeneous sources; image classification; image feature construction; linear projection model; nonlinear approach; optimization; sample selection strategy; source data sets; spectral embedding; target data sets; wavelet transformed-based feature; Bioinformatics; Bridges; Data models; Optimization; Training; Training data; Vectors; Feature generation; heterogeneous data; transfer learning;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2011.252
Filename
6104043
Link To Document