• DocumentCode
    2386294
  • Title

    Integrating DBMSs as a Read-Only Execution Layer into Hadoop

  • Author

    An, Mingyuan ; Wang, Yang ; Wang, Weiping ; Sun, Ninghui

  • Author_Institution
    Key Lab. of Comput. Syst. & Archit., Grad. Univ. of Chinese Acad. of Sci., Beijing, China
  • fYear
    2010
  • fDate
    8-11 Dec. 2010
  • Firstpage
    17
  • Lastpage
    26
  • Abstract
    To obtain the efficiency of DBMS, HadoopDB combines Hadoop and DBMS, and claims the superiority over Hadoop in terms of performance. However, the approach of HadoopDB is simply putting Map Reduce onto unmodified single-machined DBMSs which has several obvious weaknesses. In essence, HadoopDB is a parallel DBMS with fault tolerance, which incurs unnecessary overhead due to the DBMS legacy. Instead of augmenting DBMS with Hadoop techniques, we propose a new system architecture integrating modified DBMS engines as a read-only execution layer into Hadoop, where DBMS plays a role of providing efficient read-only operators rather than managing the data. Besides the obtained efficiency from DBMS engine, there are other advantages. The modified DBMS engine is able to directly process data from the HDFS (Hadoop Distributed File System) files at the block level, which means that the data replication can be handled by HDFS naturally, and the block-level parallelism is easily achieved. The global index access mechanism is added according to the Map Reduce paradigm. The data loading speed is also guaranteed by directly writing the data into HDFS with simplified logic. Experiments show that our system outperforms both original Hadoop and HadoopDB styled system.
  • Keywords
    data handling; fault tolerant computing; information retrieval; parallel databases; software architecture; DBMS engine; HDFS; Hadoop distributed file system; HadoopDB; MapReduce; block-level parallelism; data processing; database management system; fault tolerance; index access; parallel DBMS; read-only execution layer; single-machined DBMS; system architecture; Engines; Fault tolerance; Fault tolerant systems; Indexes; Loading; Parallel processing; Hadoop; database; global index access; large-scale data processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2010 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-9110-0
  • Electronic_ISBN
    978-0-7695-4287-4
  • Type

    conf

  • DOI
    10.1109/PDCAT.2010.43
  • Filename
    5704399