DocumentCode :
1665459
Title :
Tree Matching Using Data Shaping
Author :
Shukla, Parijat ; Somani, Arun K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fYear :
2015
Firstpage :
166
Lastpage :
173
Abstract :
Real time big data analytics has become important to meet the business as well as other decision making needs in many complex applications. A significant portion of such data is available and stored in semi-structured form. A tree-based organization is commonly used in such cases. Tree matching is a core component for many applications such as fraud detection, spam filtering, information visualization and extraction, user authentication, natural language processing, XML databases, bioinformatics, etc. Comparing ordered (unordered) trees is compute-intensive, in particular for Big Data. To facilitate comparison of ordered trees, in this paper we address the problem of shaping the semi-structured data to enable time efficient processing on contemporary hardware such as a GPGPU (General Purpose Graphics Processing Unit) and INTEL MIC (a multi-core processors). Specifically, our data shaping approach enables pre-computation of partial edit distance values in parallel. We evaluate our work using real world data sets. Our experimental results show that our SIMT-based PTED-GPU (Parallel Tree Edit Distance using GPU) implementation shows speedup of up to 12X when compared to the state-of-the-art in tree edit distance (TED) computation.
Keywords :
Big Data; data analysis; tree data structures; GPGPU; INTEL MIC; SIMT-based PTED-GPU; TED computation; business; contemporary hardware; decision making; general purpose graphics processing unit; multicore processors; ordered trees; parallel tree edit distance; real time Big Data analytics; real world data sets; semistructured data shaping; time efficient processing; tree matching; tree-based organization; Big data; Encoding; Graphics processing units; Hardware; Instruction sets; Time complexity; Vegetation; Big Data; Data Analytics; Data Shaping; GPGPU; Parallel Processing; Tree Edit Distance; Tree Matching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
Type :
conf
DOI :
10.1109/BigDataCongress.2015.32
Filename :
7207216
Link To Document :
بازگشت