• DocumentCode
    2821461
  • Title

    The Hadoop Distributed File System

  • Author

    Shvachko, Konstantin ; Kuang, Hairong ; Radia, Sanjay ; Chansler, Robert

  • Author_Institution
    Yahoo!, Sunnyvale, CA, USA
  • fYear
    2010
  • fDate
    3-7 May 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
  • Keywords
    Internet; distributed databases; network operating systems; Hadoop distributed file system; Yahoo!; data storage; data stream; enterprise data; Bandwidth; Clustering algorithms; Computer architecture; Concurrent computing; Distributed computing; Facebook; File servers; File systems; Protection; Protocols; HDFS; Hadoop; distributed file system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on
  • Conference_Location
    Incline Village, NV
  • Print_ISBN
    978-1-4244-7152-2
  • Electronic_ISBN
    978-1-4244-7153-9
  • Type

    conf

  • DOI
    10.1109/MSST.2010.5496972
  • Filename
    5496972