DocumentCode
3007311
Title
Distributed Stochastic Aware Random Forests -- Efficient Data Mining for Big Data
Author
Assuncao, Jose ; Fernandes, Paulo ; Lopes, Luis ; Normey, Silvio
Author_Institution
Comput. Sci. Dept., PUCRS Univ., Porto Alegre, Brazil
fYear
2013
fDate
June 27 2013-July 2 2013
Firstpage
425
Lastpage
426
Abstract
Some top data mining algorithms, as ensemble classifiers, may be inefficient to very large data set. This paper makes an initial proposal of a distributed ensemble classifier algorithm based on the popular Random Forests for Big Data. The proposed algorithm aims to improve the efficiency of the algorithm by a distributed processing model called MapReduce. At the same time, our proposed algorithm aims to reduce the randomness impact by following an algorithm called Stochastic Aware Random Forests - SARF.
Keywords
data mining; distributed processing; pattern classification; MapReduce; SARF; big data; data mining algorithm; distributed ensemble classifier algorithm; distributed processing model; distributed stochastic aware random forest; Data handling; Data mining; Data models; Data storage systems; Information management; Proposals; Stochastic processes; Big Data; Data Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location
Santa Clara, CA
Print_ISBN
978-0-7695-5006-0
Type
conf
DOI
10.1109/BigData.Congress.2013.68
Filename
6597172
Link To Document