• DocumentCode
    120142
  • Title

    Balancing scalability, performance and fault tolerance for structured data (BSPF)

  • Author

    Khalid, Amir ; Afzal, Hassan ; Aftab, Shoohira

  • Author_Institution
    Dept. of Comput. Software Eng., Nat. Univ. of Sci. & Technol., Islamabad, Pakistan
  • fYear
    2014
  • fDate
    16-19 Feb. 2014
  • Firstpage
    725
  • Lastpage
    732
  • Abstract
    Analytical business applications generate reports that give a trend predicting insight into the organization´s future, estimating the financial graphs and risk factors. These applications work on huge amounts of data, which comprises of decades of market and company records, and decision logs of an organization. Today, limit of big data is touching zeta-bytes and the structured data makes only 20% of today´s data. 20% of a giga-byte can be ignorable in comparison to big data but 20% of big data itself cannot be neglected. Traditional data management tools are like step-dads when it comes to running cross table analytical queries on structured data in distributed processing environment; response time to these data management tools are high because of the ill-aligned data sets and complex hierarchy of distributed computing environment. Data alignment requires a complete shift in data deployment paradigm from row oriented storage layout to column oriented storage layout, and complex hierarchy of distributed computing environment can be handled by keeping metadata of entire data set. Paper proposes an approach to ease the deployment of structured data into the distributed processing environment by arranging data into column-wise combinational entities. Response time to analytical queries can be lowered with the support of two concepts; Shared architecture and Multi path query execution. Highly scalable systems are Shared Nothing architecture based but degradation in performance and fault tolerance are the side effects that came with high scalability. Proposed method is an effort to balance the equation between scalability, performance and fault tolerance. And due to the limited scope of this paper we concentrate on issues and solutions for structured data only. Shared architecture and active backup helps improving the system´s performance by sharing the work-load-per-node. BSPF´s clustering methodology sheds the data pressure points to minimize the data loss per no- e crash.
  • Keywords
    Big Data; cloud computing; data structures; fault tolerant computing; pattern clustering; BSPF clustering methodology; active backup; analytical business applications; big data; cloud computing; column oriented storage layout; column-wise combinational entities; data alignment; data deployment paradigm; data loss minimization; data management tools; distributed computing environment; distributed processing environment; fault tolerance balancing; financial graph estimation; ill-aligned data sets; metadata; multipath query execution; performance balancing; risk factor estimation; row oriented storage layout; scalability balancing; shared architecture; shared nothing architecture; structured data deployment; system performance improvement; work-load-per-node sharing; Computer architecture; Computer crashes; Distributed databases; Indexes; Information management; Layout; Peer-to-peer computing; Big data; Distributed and Cloud Computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Communication Technology (ICACT), 2014 16th International Conference on
  • Conference_Location
    Pyeongchang
  • Print_ISBN
    978-89-968650-2-5
  • Type

    conf

  • DOI
    10.1109/ICACT.2014.6779058
  • Filename
    6779058