DocumentCode
1936122
Title
A Fast Method for Determining the Repeat Pattern Size in DNA Sequences
Author
Zhou, Hong-Xia ; Yan, Hong
Author_Institution
City Univ. of Hong Kong, Kowloon
Volume
6
fYear
2007
fDate
19-22 Aug. 2007
Firstpage
3314
Lastpage
3318
Abstract
Tandem repeats occur frequently in the human genome. The functions of them are still largely unclear, but some of them have been shown to cause human disease, and have relationship with regulatory functions. Thus, detecting tandem repeats has considerable significance. Because of the undetermined length of repeat pattern and indels and substitutions existing in a tandem repeat, identifying a tandem repeat in genomic sequence data is a difficult task. In this paper, an efficient algorithm is proposed, which is based on the autoregressive (AR) model. We analyze residual errors of the AR model with different orders for a DNA sequence. According to changes of residual errors, we can determine whether a sequence contains a tandem repeat and what pattern size is. Examples show this algorithm can not only detect exact tandem repeats but also approximate ones.
Keywords
DNA; autoregressive processes; genetics; DNA sequences; autoregressive model; genomic sequence data; human disease; human genome; regulatory functions; repeat pattern size determination; tandem repeat detection; Bioinformatics; DNA; Diseases; Frequency; Genomics; Humans; Machine learning; Pattern analysis; Sequences; Testing; Autoregressive model; Pattern size; Residual error; Tandem repeat;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4244-0973-0
Electronic_ISBN
978-1-4244-0973-0
Type
conf
DOI
10.1109/ICMLC.2007.4370720
Filename
4370720
Link To Document