Title :
Fast parameterized word matching on compressed text
Author :
Garg, Radhika ; Prasad, Ranga ; Agarwal, Sankalp
Author_Institution :
Ajay Kumar Garg Eng. Coll., Ghaziabad, India
Abstract :
Two strings P[1...m] and T[1...n] with m ≤ n, are said to be parameterized match (p-match), if one can be transformed into the other via some bijective mapping. It is mainly used in software maintenance, plagiarism detection and detecting isomorphism in a graph. In the compressed parameterized matching problem, our task is to find all the parameterized occurrences of a pattern in the compressed text, without decompressing it. Compressing the text before matching reduces the size and minimizes the matching time also. In this paper, we mainly focus on the parameterized word matching on the compressed text, where both patterns and text are compressed before actual matching is performed. For compressing the pattern and text, we use efficient compression code: Word Based Tagged Code (WBTC). Experimental results show that our algorithm is up to three times faster than the search on uncompressed text.
Keywords :
codes; data compression; pattern matching; text analysis; WBTC; bijective mapping; compressed parameterized matching problem; compression code; fast parameterized word matching; isomorphism detection; p-match; plagiarism detection; software maintenance; text compression; word based tagged code; Algorithm design and analysis; Approximation algorithms; Indexes; Pattern matching; Transforms; Vocabulary; Compressed parameterized matching; String matching; compressed pattern matching; information retrieval and word based tagged code;
Conference_Titel :
Computer and Communication Technology (ICCCT), 2014 International Conference on
Conference_Location :
Allahabad
Print_ISBN :
978-1-4799-6757-5
DOI :
10.1109/ICCCT.2014.7001512