• DocumentCode
    17997
  • Title

    How Hadoop Clusters Break

  • Author

    Rabkin, A. ; Katz, Randy H.

  • Author_Institution
    Princeton Univ., Princeton, NJ, USA
  • Volume
    30
  • Issue
    4
  • fYear
    2013
  • fDate
    July-Aug. 2013
  • Firstpage
    88
  • Lastpage
    94
  • Abstract
    This article describes an examination of a sample of several hundred support tickets for the Hadoop ecosystem, a widely used group of big data storage and processing systems; a taxonomy of errors and how they are addressed by supporters; and the misconfigurations that are the dominant cause of failures. Some design "antipatterns" and missing platform features contribute to these problems. Developers can use various methods to build more robust distributed systems, thereby helping users and administrators prevent some of these rough edges.
  • Keywords
    data handling; parallel programming; Hadoop cluster; Hadoop ecosystem; data processing system; data storage system; distributed system; Analytical models; Cluster approximation; Data handling; Data storage systems; Information management; Software development; Software reliability; big data; cloud computing; distributed systems; reliability; system administration;
  • fLanguage
    English
  • Journal_Title
    Software, IEEE
  • Publisher
    ieee
  • ISSN
    0740-7459
  • Type

    jour

  • DOI
    10.1109/MS.2012.73
  • Filename
    6216347