DocumentCode
659404
Title
Elastic algorithms for guaranteeing quality monotonicity in big data mining
Author
Rui Han ; Lei Nie ; Ghanem, Moustafa M. ; Yike Guo
Author_Institution
Imperial Coll. London, London, UK
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
45
Lastpage
50
Abstract
When mining large data volumes in big data applications users are typically willing to use algorithms that produce acceptable approximate results satisfying the given resource and time constraints. Two key challenges arise when designing such algorithms. The first relates to reasoning about tradeoffs between the quality of data mining output, e.g. prediction accuracy for classification tasks and available resource and time budgets. The second is organizing the computation of the algorithm to guarantee producing better quality of results as more budget is used. Little work has addressed these two challenges together in a generic way. In this paper, we propose a novel framework for developing elastic big data mining algorithms. Based on Shannon´s entropy, an information-theoretic approach is introduced to reason about how result quality is affected by the allocated budget. This is then used to guide the development of algorithms that adapt to the available time budgets while guaranteeing producing better quality results as more budgets are used. We demonstrate the application of the framework by developing elastic k-Nearest Neighbour (kNN) classification and collaborative filtering (CF) recommendation algorithms as two examples. The core of both elastic algorithms is to use a naïve kNN classification or CF algorithm over R-tree data structures that successively approximate the entire datasets. Experimental evaluation was performed using prediction accuracy as quality metric on real datasets. The results show that elastic mining algorithms indeed produce results with consistent increase in observable qualities, i.e., prediction accuracy, in practice.
Keywords
Big Data; collaborative filtering; data mining; entropy; learning (artificial intelligence); pattern classification; CF recommendation algorithms; Shannon entropy; big data mining; collaborative filtering; data mining output; elastic algorithms; elastic k-nearest neighbour classification; information-theoretic approach; kNN classification; prediction accuracy; quality metric; quality monotonicity; resource budgets; resource constraints; time budgets; time constraints; Accuracy; Algorithm design and analysis; Approximation algorithms; Classification algorithms; Data mining; Encoding; Prediction algorithms; R-tree; elastic data mining algorithms; entropy; quality monotonicity;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691553
Filename
6691553
Link To Document