DocumentCode
239104
Title
A combined MapReduce-windowing two-level parallel scheme for evolutionary prototype generation
Author
Triguero, Isaac ; Peralta, Daniel ; Bacardit, Jaume ; Garcia, Sergio ; Herrera, Francisco
Author_Institution
Dept. of Comput. Sci. & Artificial Intell., Univ. of Granada, Granada, Spain
fYear
2014
fDate
6-11 July 2014
Firstpage
3036
Lastpage
3043
Abstract
Evolutionary prototype generation techniques have demonstrated their usefulness to improve the capabilities of the nearest neighbor classifier. They act as data reduction algorithms by generating representative points of a given problem. Their main purposes are to speed up the classification process and to reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. Nowadays, with the increment of available data, the use of this kind of reduction techniques becomes more important. However, their applicability can be limited to problems with no more than tens of thousands of instances. In order to address this limitation, in this work we develop a two-level parallelization scheme for evolutionary prototype generation methods. Firstly, it distributes the functioning of these algorithms in several tasks based on a MapReduce framework. Then, for each one of these tasks (mappers), we accelerate the prototype generation process by using a windowing approach. This model enables evolutionary prototype generation algorithms to be applied over large-scale classification problems without accuracy loss. Our preliminary experiments using a dataset of 1 million instances show that this proposal is an appropriate tool to improve the performance of the nearest neighbor classifier with big data.
Keywords
Big Data; data reduction; parallel processing; pattern classification; Big Data; classification process; combined MapReduce-windowing two-level parallel scheme; data reduction algorithm; evolutionary prototype generation techniques; large-scale classification problems; nearest neighbor classifier; nearest neighbor rule; noise sensitivity; representative point generation; storage requirements reduction; Acceleration; Big data; Computational modeling; Data mining; Prototypes; Runtime; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation (CEC), 2014 IEEE Congress on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6626-4
Type
conf
DOI
10.1109/CEC.2014.6900490
Filename
6900490
Link To Document