Title :
Maximal pattern matching with flexible wildcard gaps and one-off constraint
Author :
Dahiya, Arzoo ; Garg, Deepak
Author_Institution :
Dept. of Comput. Sci. & Eng., Thapar Univ., Patiala, India
Abstract :
Pattern matching is a fundamental operation in finding knowledge from large amount of biosequence data. Finding patterns help in analyzing the property of a sequence. This paper focuses on the problem of maximal pattern matching with flexible wildcard gaps and length constraints under the one-off condition. The problem is to find the maximum number of occurrences of a pattern P with user specified wildcard gap between every two consecutive letters of P in a biological sequence S under the one-off condition and constraint on the overall length of the matching occurrence. To obtain the optimal solution for this problem is difficult. We propose a heuristic algorithm, MOGO, based on the Nettree data structure to solve this problem. Theoretical analysis and experimental results demonstrate that MOGO performs better than the existing algorithms in most of the cases when tested on real world biological sequences.
Keywords :
bioinformatics; constraint handling; pattern matching; sequences; tree data structures; MOGO; Nettree data structure; biosequence data; flexible wildcard gaps; heuristic algorithm; length constraints; matching occurrence; maximal pattern matching; one-off constraint; optimal solution; real world biological sequences; Accuracy; Algorithm design and analysis; Biological information theory; Complexity theory; Heuristic algorithms; Pattern matching; Gap; Length constraints; One-off; Pattern matching; Wildcard;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
DOI :
10.1109/ICACCI.2014.6968285