DocumentCode
1665459
Title
Tree Matching Using Data Shaping
Author
Shukla, Parijat ; Somani, Arun K.
Author_Institution
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fYear
2015
Firstpage
166
Lastpage
173
Abstract
Real time big data analytics has become important to meet the business as well as other decision making needs in many complex applications. A significant portion of such data is available and stored in semi-structured form. A tree-based organization is commonly used in such cases. Tree matching is a core component for many applications such as fraud detection, spam filtering, information visualization and extraction, user authentication, natural language processing, XML databases, bioinformatics, etc. Comparing ordered (unordered) trees is compute-intensive, in particular for Big Data. To facilitate comparison of ordered trees, in this paper we address the problem of shaping the semi-structured data to enable time efficient processing on contemporary hardware such as a GPGPU (General Purpose Graphics Processing Unit) and INTEL MIC (a multi-core processors). Specifically, our data shaping approach enables pre-computation of partial edit distance values in parallel. We evaluate our work using real world data sets. Our experimental results show that our SIMT-based PTED-GPU (Parallel Tree Edit Distance using GPU) implementation shows speedup of up to 12X when compared to the state-of-the-art in tree edit distance (TED) computation.
Keywords
Big Data; data analysis; tree data structures; GPGPU; INTEL MIC; SIMT-based PTED-GPU; TED computation; business; contemporary hardware; decision making; general purpose graphics processing unit; multicore processors; ordered trees; parallel tree edit distance; real time Big Data analytics; real world data sets; semistructured data shaping; time efficient processing; tree matching; tree-based organization; Big data; Encoding; Graphics processing units; Hardware; Instruction sets; Time complexity; Vegetation; Big Data; Data Analytics; Data Shaping; GPGPU; Parallel Processing; Tree Edit Distance; Tree Matching;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.32
Filename
7207216
Link To Document