• DocumentCode
    1340964
  • Title

    Scalable and Parallel Boosting with MapReduce

  • Author

    Palit, Indranil ; Reddy, Chandan K.

  • Author_Institution
    University of Notre Dame, Notre Dame
  • Volume
    24
  • Issue
    10
  • fYear
    2012
  • Firstpage
    1904
  • Lastpage
    1916
  • Abstract
    In this era of data abundance, it has become critical to process large volumes of data at much faster rates than ever before. Boosting is a powerful predictive model that has been successfully used in many real-world applications. However, due to the inherent sequential nature, achieving scalability for boosting is nontrivial and demands the development of new parallelized versions which will allow them to efficiently handle large-scale data. In this paper, we propose two parallel boosting algorithms, AdaBoost.PL and LogitBoost.PL, which facilitate simultaneous participation of multiple computing nodes to construct a boosted ensemble classifier. The proposed algorithms are competitive to the corresponding serial versions in terms of the generalization performance. We achieve a significant speedup since our approach does not require individual computing nodes to communicate with each other for sharing their data. In addition, the proposed approach also allows for preserving privacy of computations in distributed environments. We used MapReduce framework to implement our algorithms and demonstrated the performance in terms of classification accuracy, speedup and scaleup using a wide variety of synthetic and real-world data sets.
  • Keywords
    Algorithm design and analysis; Computational modeling; Convergence; Distributed databases; Prediction algorithms; Training; Boosting; MapReduce.; classification; distributed computing; parallel algorithms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.208
  • Filename
    6035709