DocumentCode
2720761
Title
Efficient sample generation for scalable meta learning
Author
Schelter, Sebastian ; Soto, Juan ; Markl, Volker ; Burdick, Douglas ; Reinwald, Berthold ; Evfimievski, Alexandre
fYear
2015
fDate
13-17 April 2015
Firstpage
1191
Lastpage
1202
Abstract
Meta learning techniques such as cross-validation and ensemble learning are crucial for applying machine learning to real-world use cases. These techniques first generate samples from input data, and then train and evaluate machine learning models on these samples. For meta learning on large datasets, the efficient generation of samples becomes problematic, especially when the data is stored distributed in a block-partitioned representation, and processed on a shared-nothing cluster. We present a novel, parallel algorithm for efficient sample generation from large, block-partitioned datasets in a shared-nothing architecture. This algorithm executes in a single pass over the data, and minimizes inter-machine communication. The algorithm supports a wide variety of sample generation techniques through an embedded user-defined sampling function. We illustrate how to implement distributed sample generation for popular meta learning techniques such as hold-out tests, k-fold cross-validation, and bagging, using our algorithm and present an experimental evaluation on datasets with billions of datapoints.
Keywords
learning (artificial intelligence); parallel algorithms; sampling methods; block-partitioned datasets; distributed sample generation; efficient sample generation; embedded user-defined sampling function; ensemble learning; intermachine communication; machine learning; meta learning techniques; parallel algorithm; scalable meta learning; shared-nothing architecture; Data models; Distributed databases; Electronic mail; Indexes; Partitioning algorithms; Predictive models; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location
Seoul
Type
conf
DOI
10.1109/ICDE.2015.7113367
Filename
7113367
Link To Document