• DocumentCode
    1320868
  • Title

    Prediction of Optimal Parallelism Level in Wide Area Data Transfers

  • Author

    Yildirim, Esma ; Yin, Dengpan ; Kosar, Tevfik

  • Author_Institution
    Dept. of Comput. Sci. & Eng., State Univ. of New York at Buffalo, Buffalo, NY, USA
  • Volume
    22
  • Issue
    12
  • fYear
    2011
  • Firstpage
    2033
  • Lastpage
    2045
  • Abstract
    Wide area data transfer may be a major bottleneck for the end-to-end performance of distributed applications. A practical way of increasing the wide area throughput at the application layer is using multiple parallel streams. Although increased number of parallel streams may yield much better performance than using a single stream, overwhelming the network by opening too many streams may have an inverse effect. The congestion created by excess number of streams may cause a drop down in the throughput achieved. Hence, it is important to decide on the optimal number of streams without congesting the network. Predicting this "optimum” number is not straightforward, since it depends on many parameters specific to each individual transfer. Generic models that try to predict this number either rely too much on historical information or fail to achieve accurate predictions. In this paper, we present a set of new models which aim to approximate the optimal number with least history information and lowest prediction overhead. An algorithm is introduced to select the best combination of historic information to do the prediction for evaluation purposes as well as optimizing prediction by reducing error rate. We measure the feasibility and accuracy of the proposed prediction models by comparing to actual GridFTP data transfer by using little historical information and have seen that we could predict the throughput of parallel streams accurately and find a very close approximation of the optimal stream number.
  • Keywords
    data communication; error statistics; grid computing; parallel processing; peer-to-peer computing; protocols; GridFTP data transfer; application layer; distributed application; end-to-end performance; error rate reduction; multiple parallel stream; optimal parallelism level; optimal stream number; parallel stream; wide area data transfer; Concurrency control; Data models; Distributed processing; Mathematical model; Network protocols; Parallel processing; Predictive models; Distributed applications; modeling and prediction; network protocols.; parallelism and concurrency;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2011.228
  • Filename
    6018962