مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting unlabeled data for improving accuracy of predictive data mining

DocumentCode :

2369611

Title :

Exploiting unlabeled data for improving accuracy of predictive data mining

Author :

Peng, Kang ; Vucetic, Slobodan ; Han, Bo ; Xie, Hongbo ; Obradovic, Zoran

Author_Institution :

Center for Inf. Sci. & Technol., Temple Univ., Philadelphia, PA, USA

fYear :

2003

fDate :

19-22 Nov. 2003

Firstpage :

267

Lastpage :

274

Abstract :

Predictive data mining typically relies on labeled data without exploiting a much larger amount of available unlabeled data. We show that using unlabeled data can be beneficial in a range of important prediction problems and therefore should be an integral part of the learning process. Given an unlabeled dataset representative of the underlying distribution and a K-class labeled sample that might be biased, our approach is to learn K contrast classifiers each trained to discriminate a certain class of labeled data from the unlabeled population. We illustrate that contrast classifiers can be useful in one-class classification, outlier detection, density estimation, and learning from biased data. The advantages of the proposed approach are demonstrated by an extensive evaluation on synthetic data followed by real-life bioinformatics applications for (1) ranking PubMed articles by their relevance to protein disorder and (2) cost-effective enlargement of a disordered protein database.

Keywords :

data mining; learning (artificial intelligence); medical information systems; pattern classification; probability; very large databases; K contrast classifier learning; K-class labeled sample; PubMed article ranking; biased data; disordered protein database; one-class classification; outlier detection; prediction problem; predictive data mining accuracy improvement; real-life bioinformatics application; synthetic data; unlabeled data exploitation; Accuracy; Bioinformatics; Costs; Data mining; Databases; Information science; Labeling; Proteins; Sampling methods; Supervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2003. ICDM 2003. Third IEEE International Conference on

Print_ISBN :

0-7695-1978-4

Type :

conf

DOI :

10.1109/ICDM.2003.1250929

Filename :

1250929

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2369611