• DocumentCode
    2321274
  • Title

    ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable

  • Author

    Chen, Ting ; Taura, Kenjiro

  • Author_Institution
    Univ. of Tokyo, Tokyo, Japan
  • fYear
    2012
  • fDate
    13-16 May 2012
  • Firstpage
    474
  • Lastpage
    481
  • Abstract
    This paper proposes extensions to parallel database systems called collective queries and User-Defined eXecutables (UDX). A collective query is an SQL query whose results are distributed to multiple clients and then processed by them in parallel, using arbitrary external programs (user-defined executables). The intended applications are data intensive work-flows, typically built out of various independently developed executables and scripts. Collective queries facilitate description of such workflows by making data parallel execution of external programs on big data easy and streamlined. It also provides the workflow developers with a familiar and powerful language SQL, for flexible data filtering and stereotypical data processing tasks. We implement this concept in a system "ParaLite", a parallel database system based on a popular lightweight database SQ Lite. It equips with data transfer optimization algorithms that distribute query results to multiple clients, taking both communication cost and compute loads into account. We verified the correctness and performance of Para Lite and the experimental results show that Para Lite has good performance on SQL processing and achieves good scalability for the parallelization of UDX.
  • Keywords
    SQL; parallel databases; query processing; ParaLite; SQ Lite database; SQL processing; SQL query; UDX; arbitrary external programs; collective queries; communication cost; compute loads; data intensive work-flows; data parallel execution; data transfer optimization algorithms; flexible data filtering; parallel database systems; query results distribution; stereotypical data processing tasks; user-defined executable; Data processing; Database systems; Open source software; Relational databases; Standards; Syntactics; Collective Query; Parallel database system; User-Defined Executable;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
  • Conference_Location
    Ottawa, ON
  • Print_ISBN
    978-1-4673-1395-7
  • Type

    conf

  • DOI
    10.1109/CCGrid.2012.74
  • Filename
    6217456