DocumentCode
2821461
Title
The Hadoop Distributed File System
Author
Shvachko, Konstantin ; Kuang, Hairong ; Radia, Sanjay ; Chansler, Robert
Author_Institution
Yahoo!, Sunnyvale, CA, USA
fYear
2010
fDate
3-7 May 2010
Firstpage
1
Lastpage
10
Abstract
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
Keywords
Internet; distributed databases; network operating systems; Hadoop distributed file system; Yahoo!; data storage; data stream; enterprise data; Bandwidth; Clustering algorithms; Computer architecture; Concurrent computing; Distributed computing; Facebook; File servers; File systems; Protection; Protocols; HDFS; Hadoop; distributed file system;
fLanguage
English
Publisher
ieee
Conference_Titel
Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on
Conference_Location
Incline Village, NV
Print_ISBN
978-1-4244-7152-2
Electronic_ISBN
978-1-4244-7153-9
Type
conf
DOI
10.1109/MSST.2010.5496972
Filename
5496972
Link To Document