Title :
A Fast Method for Determining the Repeat Pattern Size in DNA Sequences
Author :
Zhou, Hong-Xia ; Yan, Hong
Author_Institution :
City Univ. of Hong Kong, Kowloon
Abstract :
Tandem repeats occur frequently in the human genome. The functions of them are still largely unclear, but some of them have been shown to cause human disease, and have relationship with regulatory functions. Thus, detecting tandem repeats has considerable significance. Because of the undetermined length of repeat pattern and indels and substitutions existing in a tandem repeat, identifying a tandem repeat in genomic sequence data is a difficult task. In this paper, an efficient algorithm is proposed, which is based on the autoregressive (AR) model. We analyze residual errors of the AR model with different orders for a DNA sequence. According to changes of residual errors, we can determine whether a sequence contains a tandem repeat and what pattern size is. Examples show this algorithm can not only detect exact tandem repeats but also approximate ones.
Keywords :
DNA; autoregressive processes; genetics; DNA sequences; autoregressive model; genomic sequence data; human disease; human genome; regulatory functions; repeat pattern size determination; tandem repeat detection; Bioinformatics; DNA; Diseases; Frequency; Genomics; Humans; Machine learning; Pattern analysis; Sequences; Testing; Autoregressive model; Pattern size; Residual error; Tandem repeat;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370720