Title :
Mining incomplete data with many attribute-concept values and "do not care" conditions
Author :
Patrick G. Clark;Jerzy W. Grzymala-Busse
Author_Institution :
Department of Electrical Eng. and Computer Sci., University of Kansas, Lawrence, KS 66045, USA
Abstract :
In this paper we present novel experimental results comparing two interpretations of missing attribute values: attribute-concept values and "do not care" conditions. Experiments were conducted on 12 data sets with many missing attribute values using the MLEM2 rule induction system. In the experiments, three kinds of probabilistic approximations were used: singleton, subset and concept; with the error rate of the induced rules evaluated by ten-fold cross validation. The results of the experiments compared two interpretations of missing values, attribute-concept values and "do not care" conditions, finding the best result among the three probabilistic approximations. The outcomes show that for two cases the better performance was accomplished using attribute-concept values, for one case the better performance was accomplished using "do not care" conditions. For remaining three cases the difference in performance was not statistically significant (5% significance level).
Keywords :
"Approximation methods","Probabilistic logic","Error analysis","Set theory","Humidity","Data mining","Temperature distribution"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363926