• DocumentCode
    8912
  • Title

    Modeling of Distributed File Systems for Practical Performance Analysis

  • Author

    Yongwei Wu ; Feng Ye ; Kang Chen ; Weimin Zheng

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    25
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    156
  • Lastpage
    166
  • Abstract
    Cloud computing has received significant attention recently. Delivering quality guaranteed services in clouds is highly desired. Distributed file systems (DFSs) are the key component of any cloud-scale data processing middleware. Evaluating the performance of DFSs is accordingly very important. To avoid cost for late life cycle performance fixes and architectural redesign, providing performance analysis before the deployment of DFSs is also particularly important. In this paper, we propose a systematic and practical performance analysis framework, driven by architecture and design models for defining the structure and behavior of typical master/slave DFSs. We put forward a configuration guideline for specifications of configuration alternatives of such DFSs, and a practical approach for both qualitatively and quantitatively performance analysis of DFSs with various configuration settings in a systematic way. What distinguish our approach from others is that 1) most of existing works rely on performance measurements under a variety of workloads/strategies, comparing with other DFSs or running application programs, but our approach is based on architecture and design level models and systematically derived performance models; 2) our approach is able to both qualitatively and quantitatively evaluate the performance of DFSs; and 3) our approach not only can evaluate the overall performance of a DFS but also its components and individual steps. We demonstrate the effectiveness of our approach by evaluating Hadoop distributed file system (HDFS). A series of real-world experiments on EC2 (Amazon Elastic Compute Cloud), Tansuo and Inspur Clusters, were conducted to qualitatively evaluate the effectiveness of our approach. We also performed a set of experiments of HDFS on EC2 to quantitatively analyze the performance and limitation of the metadata server of DFSs. Results show that our approach can achieve sufficient performance analysis. Similarly, the proposed approach cou- d be also applied to evaluate other DFSs such as MooseFS, GFS, and zFS.
  • Keywords
    cloud computing; distributed databases; middleware; public domain software; software architecture; software performance evaluation; Amazon elastic compute cloud; DFS; EC2; HDFS; Hadoop distributed file system modeling; Tansuo and Inspur clusters; architectural redesign; cloud computing; cloud-scale data processing middleware; cost avoidance; late life cycle performance fixes; metadata server; performance evaluation; performance measurements; practical performance analysis framework; Analytical models; Computer architecture; Data models; Performance analysis; Software; Time factors; Unified modeling language; Distributed file system; HDFS; architecture model; practical performance analysis;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.19
  • Filename
    6410316