Title :
Iterative Refinement of Repeat Sequence Specification Using Constrained Pattern Matching
Author :
He, Dan ; Arslan, Abdullah N. ; He, Yu ; Wu, Xindong
Author_Institution :
Univ. of Vermont, Burlington
Abstract :
Repeated sequences in genome are structures which indicate important biological functions such as protein binding. They are associated with various genetic diseases. We consider the problem of finding a specification for a "significant" repeating pattern in a given sequence. A significant pattern carries high amount of information, and it has many non-overlapping repeats. We propose for this problem, a method that takes as input an initial specification for a repeating pattern. A pattern is specified by a sequence of letters separated by varying length wildcards. The method presents to the user maximal occurrences for the current pattern specification in a way that no text symbol can be shared as a letter by two different pattern occurrences. This reduces the begin-end position-overlaps among different occurrences. The user modifies the specification manually to eliminate overlapping repeats. This process continues until a specification for a significant pattern is obtained.
Keywords :
biological techniques; cellular biophysics; diseases; genetics; molecular biophysics; proteins; begin-end position-overlaps; constrained pattern matching; current pattern specification; genetic diseases; genome; iterative refinement; nonoverlapping repeats; protein binding; repeat sequence specification; repeating pattern; Bioinformatics; Biology; Computer science; Diseases; Genetics; Genomics; Helium; Humans; Pattern matching; Proteins; edge disjoint path; maximum flow; pattern matching with wildcards; repeats; vertex-disjoints path;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375715