Abstract :
Abstract — HDFS is a distributed file system designed to
hold very large amounts of data (terabytes or even
petabytes), and provide high-throughput access to this
information. Files are stored in a redundant fashion across
multiple machines to ensure their durability to failure and
high availability to very parallel applications.
This paper includes the step by step introduction to the
file system to distributed file system and to the Hadoop
Distributed File System. Section I introduces What is file
System, Need of File System, Conventional File System, its
advantages, Need of Distributed File System, What is
Distributed File System and Benefits of Distributed File
System. Also the analysis of large dataset and comparison of
mapreducce with RDBMS, HPC and Grid Computing
communities have been doing large-scale data processing for
years. Sections II introduce the concept of Hadoop
Distributed File System. Lastly section III contains
Conclusion followed with the References.