مرکز منطقه ای اطلاع رساني علوم و فناوري - Introduction to Hadoop Distributed File System

Abstract :

Abstract — HDFS is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications. This paper includes the step by step introduction to the file system to distributed file system and to the Hadoop Distributed File System. Section I introduces What is file System, Need of File System, Conventional File System, its advantages, Need of Distributed File System, What is Distributed File System and Benefits of Distributed File System. Also the analysis of large dataset and comparison of mapreducce with RDBMS, HPC and Grid Computing communities have been doing large-scale data processing for years. Sections II introduce the concept of Hadoop Distributed File System. Lastly section III contains Conclusion followed with the References.