Balancing scalability, performance and fault tolerance for structured data (BSPF)

Author

Khalid, Amir ; Afzal, Hassan ; Aftab, Shoohira

Author_Institution

Dept. of Comput. Software Eng., Nat. Univ. of Sci. & Technol., Islamabad, Pakistan

fYear

2014

fDate

16-19 Feb. 2014

Firstpage

725

Lastpage

732

Abstract

Analytical business applications generate reports that give a trend predicting insight into the organization´s future, estimating the financial graphs and risk factors. These applications work on huge amounts of data, which comprises of decades of market and company records, and decision logs of an organization. Today, limit of big data is touching zeta-bytes and the structured data makes only 20% of today´s data. 20% of a giga-byte can be ignorable in comparison to big data but 20% of big data itself cannot be neglected. Traditional data management tools are like step-dads when it comes to running cross table analytical queries on structured data in distributed processing environment; response time to these data management tools are high because of the ill-aligned data sets and complex hierarchy of distributed computing environment. Data alignment requires a complete shift in data deployment paradigm from row oriented storage layout to column oriented storage layout, and complex hierarchy of distributed computing environment can be handled by keeping metadata of entire data set. Paper proposes an approach to ease the deployment of structured data into the distributed processing environment by arranging data into column-wise combinational entities. Response time to analytical queries can be lowered with the support of two concepts; Shared architecture and Multi path query execution. Highly scalable systems are Shared Nothing architecture based but degradation in performance and fault tolerance are the side effects that came with high scalability. Proposed method is an effort to balance the equation between scalability, performance and fault tolerance. And due to the limited scope of this paper we concentrate on issues and solutions for structured data only. Shared architecture and active backup helps improving the system´s performance by sharing the work-load-per-node. BSPF´s clustering methodology sheds the data pressure points to minimize the data loss per no- e crash.

Keywords

Big Data; cloud computing; data structures; fault tolerant computing; pattern clustering; BSPF clustering methodology; active backup; analytical business applications; big data; cloud computing; column oriented storage layout; column-wise combinational entities; data alignment; data deployment paradigm; data loss minimization; data management tools; distributed computing environment; distributed processing environment; fault tolerance balancing; financial graph estimation; ill-aligned data sets; metadata; multipath query execution; performance balancing; risk factor estimation; row oriented storage layout; scalability balancing; shared architecture; shared nothing architecture; structured data deployment; system performance improvement; work-load-per-node sharing; Computer architecture; Computer crashes; Distributed databases; Indexes; Information management; Layout; Peer-to-peer computing; Big data; Distributed and Cloud Computing;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Communication Technology (ICACT), 2014 16th International Conference on

Conference_Location

Pyeongchang

Print_ISBN

978-89-968650-2-5

Type

conf

DOI

10.1109/ICACT.2014.6779058

Filename

6779058

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=120142