• DocumentCode
    1926005
  • Title

    Numerically stable, single-pass, parallel statistics algorithms

  • Author

    Bennett, Janine ; Grout, Ray ; Pébay, Philippe ; Roe, Diana ; Thompson, David

  • Author_Institution
    Sandia Nat. Labs., Livermore, CA, USA
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.
  • Keywords
    data handling; parallel algorithms; principal component analysis; numerical stability; numerically stable parallel statistics algorithms; open source parallel statistics; principal component analysis; single-pass parallel statistics algorithms; statistical analysis package; turbulent combustion simulation data; Concurrent computing; Large-scale systems; Numerical stability; Packaging; Principal component analysis; Robustness; Scalability; Statistical analysis; Statistical distributions; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    1552-5244
  • Print_ISBN
    978-1-4244-5011-4
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2009.5289161
  • Filename
    5289161