DocumentCode :
659404
Title :
Elastic algorithms for guaranteeing quality monotonicity in big data mining
Author :
Rui Han ; Lei Nie ; Ghanem, Moustafa M. ; Yike Guo
Author_Institution :
Imperial Coll. London, London, UK
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
45
Lastpage :
50
Abstract :
When mining large data volumes in big data applications users are typically willing to use algorithms that produce acceptable approximate results satisfying the given resource and time constraints. Two key challenges arise when designing such algorithms. The first relates to reasoning about tradeoffs between the quality of data mining output, e.g. prediction accuracy for classification tasks and available resource and time budgets. The second is organizing the computation of the algorithm to guarantee producing better quality of results as more budget is used. Little work has addressed these two challenges together in a generic way. In this paper, we propose a novel framework for developing elastic big data mining algorithms. Based on Shannon´s entropy, an information-theoretic approach is introduced to reason about how result quality is affected by the allocated budget. This is then used to guide the development of algorithms that adapt to the available time budgets while guaranteeing producing better quality results as more budgets are used. We demonstrate the application of the framework by developing elastic k-Nearest Neighbour (kNN) classification and collaborative filtering (CF) recommendation algorithms as two examples. The core of both elastic algorithms is to use a naïve kNN classification or CF algorithm over R-tree data structures that successively approximate the entire datasets. Experimental evaluation was performed using prediction accuracy as quality metric on real datasets. The results show that elastic mining algorithms indeed produce results with consistent increase in observable qualities, i.e., prediction accuracy, in practice.
Keywords :
Big Data; collaborative filtering; data mining; entropy; learning (artificial intelligence); pattern classification; CF recommendation algorithms; Shannon entropy; big data mining; collaborative filtering; data mining output; elastic algorithms; elastic k-nearest neighbour classification; information-theoretic approach; kNN classification; prediction accuracy; quality metric; quality monotonicity; resource budgets; resource constraints; time budgets; time constraints; Accuracy; Algorithm design and analysis; Approximation algorithms; Classification algorithms; Data mining; Encoding; Prediction algorithms; R-tree; elastic data mining algorithms; entropy; quality monotonicity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691553
Filename :
6691553
Link To Document :
بازگشت