• DocumentCode
    3538372
  • Title

    JRBridge: A Framework of Large-Scale Statistical Computing for R

  • Author

    Xia Xie ; Jie Cao ; Hai Jin ; Xijiang Ke ; Wenzhi Cao

  • Author_Institution
    Services Comput. Technol. & Syst. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2012
  • fDate
    6-8 Dec. 2012
  • Firstpage
    27
  • Lastpage
    34
  • Abstract
    Demands for highly scalable parallel data processing platforms is raising due to an explosion in the number of massive-scale data intensive applications both in industry and in sciences. Performing statistical computing over huge data repositories poses a significant challenge to existing statistical software and computational infrastructure. After analyzing various open source computational infrastructures and their programming paradigm APIs, the results have shown that most of them are JVM based, and their APIs are given as Java interfaces or abstract classes. This paper proposes a generic framework JR Bridge, which can integrate R and JVM-based computational infrastructures by generating Java APIs code wrapper around the native R code automatically and handling type conversion. Using this framework, we build a distributed statistical computing environment by integrating R with Hadoop. With the Hadoop Distributed File System plug in, it brings a way to store and access datasets with millions of objects. With MapReduce plug in, it brings a natural environment to code MapReduce algorithms in R. The experiment result shows JR Bridge scales linearly with the size of the datasets and thus provides a scalable solution for large-scale statistical computing in R.
  • Keywords
    application program interfaces; mathematics computing; parallel processing; statistical analysis; API; Hadoop distributed file system; JRBridge framework; Java API code wrapper; MapReduce algorithm; large-scale statistical computing; massive-scale data intensive application; parallel data processing platform; Algorithm design and analysis; Bridges; Computational modeling; Java; Libraries; Programming; Storms; Hadoop; JVM; MapReduce; R Language; Statistical Computing Method;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing Conference (APSCC), 2012 IEEE Asia-Pacific
  • Conference_Location
    Guilin
  • Print_ISBN
    978-1-4673-4825-6
  • Type

    conf

  • DOI
    10.1109/APSCC.2012.74
  • Filename
    6478195