• DocumentCode
    654993
  • Title

    Matrix-Query: A Distributed SQL-Like Query Processing Model for Large Database Clusters

  • Author

    Qiao Liu ; Ping Ji ; Yuan Zuo

  • Author_Institution
    Dept. of Comput. Sci. & Eng., BeiHang Univ., Beijing, China
  • fYear
    2013
  • fDate
    10-12 Oct. 2013
  • Firstpage
    179
  • Lastpage
    185
  • Abstract
    Along with the development of distributed computation and the rapid growth of data, scientific research increasingly requires the support of high-efficiency relational data processing framework. According to the characteristics of scientific data, for example bulk inserts and unfrequented change, this paper proposes a streaming processing model called Matrix-Query with the matching data storage architecture for relational query. Through transforming the original relational schema to entities and key-value indexing, the data storage solution provides more localization operation and data positioning. Compare to traditional Map-Reduce model, the Matrix-Query isolates the influence between subtasks to ensure execution in a streaming and parallel manner and reduces negative impacts of writing intermediate file. We also optimize the data structure and subtask management to improve the performance of Matrix-Query. The experimental results demonstrate performance advantage of Matrix-query compared to two famous data processing systems, Hive and HadoopDB, which build on the top of Map-Reduce model.
  • Keywords
    SQL; database indexing; distributed databases; natural sciences computing; query processing; relational databases; very large databases; HadoopDB; Hive; Map-Reduce model; Matrix-Query; bulk insert; data positioning; data processing system; data storage architecture; data structure optimization; distributed SQL-like query processing model; distributed computation; high-efficiency relational data processing framework; key-value indexing; large database clusters; localization operation; relational query; relational schema; scientific data; streaming processing model; subtask management; Computational modeling; Data models; Distributed databases; Indexing; Memory; Query processing; SQL; distributed computation; relational query processing model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2013 International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/CyberC.2013.36
  • Filename
    6685677