DocumentCode
2222534
Title
Data acquisition with active and impact-sensitive instance selection
Author
Zhu, Xingquan ; Wu, Xindong
Author_Institution
Dept. of Comput. Sci., Vermont Univ., Burlington, VT, USA
fYear
2004
fDate
15-17 Nov. 2004
Firstpage
721
Lastpage
726
Abstract
Real-world data is never perfect and can often suffer from corruptions or missing values that may impact models created from the data. To build accurate predictive models, data acquisition is usually adopted to complete missing values in the incomplete instances. Due to the significant cost of doing so and the inherent correlations in the dataset, acquiring complete information for all instances is likely prohibitive and unnecessary. An interesting and important problem raises here is to select what kind of instances to complete so the model built from the data can receive significant improvement. We propose two solutions to resolve this problem, and the essential idea is to complete the attributes with higher impacts to the system performance. The first solution is based on an impact-sensitive instance ranking mechanism [X. Zhu et al. (2004)]. We explore the correlation between attributes and the class and use the correlation as weights of the attributes; the larger the weight, the higher the impacts of the attribute. For each incomplete instance, we sum all weights of the attributes with missing values, and the instance with larger sum appears to be more important for users to complete their missing information. In the second solution, active learning, impact-sensitive instance ranking and missing value prediction are combined for data acquisition. Experimental results from real-world datasets demonstrate the effectiveness of our strategies.
Keywords
correlation methods; data acquisition; data mining; data models; learning (artificial intelligence); pattern classification; statistical analysis; active learning; data acquisition; instance ranking mechanism; missing attribute value prediction; real-world datasets; Bayesian methods; Computer science; Costs; Data acquisition; Data mining; Filling; Predictive models; Statistics; System performance; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
ISSN
1082-3409
Print_ISBN
0-7695-2236-X
Type
conf
DOI
10.1109/ICTAI.2004.46
Filename
1374260
Link To Document