• DocumentCode
    3007311
  • Title

    Distributed Stochastic Aware Random Forests -- Efficient Data Mining for Big Data

  • Author

    Assuncao, Jose ; Fernandes, Paulo ; Lopes, Luis ; Normey, Silvio

  • Author_Institution
    Comput. Sci. Dept., PUCRS Univ., Porto Alegre, Brazil
  • fYear
    2013
  • fDate
    June 27 2013-July 2 2013
  • Firstpage
    425
  • Lastpage
    426
  • Abstract
    Some top data mining algorithms, as ensemble classifiers, may be inefficient to very large data set. This paper makes an initial proposal of a distributed ensemble classifier algorithm based on the popular Random Forests for Big Data. The proposed algorithm aims to improve the efficiency of the algorithm by a distributed processing model called MapReduce. At the same time, our proposed algorithm aims to reduce the randomness impact by following an algorithm called Stochastic Aware Random Forests - SARF.
  • Keywords
    data mining; distributed processing; pattern classification; MapReduce; SARF; big data; data mining algorithm; distributed ensemble classifier algorithm; distributed processing model; distributed stochastic aware random forest; Data handling; Data mining; Data models; Data storage systems; Information management; Proposals; Stochastic processes; Big Data; Data Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2013 IEEE International Congress on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5006-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2013.68
  • Filename
    6597172