DocumentCode :
863652
Title :
Genomewide motif identification using a dictionary model
Author :
Sabatti, Chiara ; Lange, Kenneth
Author_Institution :
Human Genetics & Stat. Departments, California Univ., Los Angeles, CA, USA
Volume :
90
Issue :
11
fYear :
2002
fDate :
11/1/2002 12:00:00 AM
Firstpage :
1803
Lastpage :
1810
Abstract :
This paper surveys and extends models and algorithms for identifying binding sites in noncoding regions of DNA. Binding sites control the transcription of genes into messenger RNA in preparation for translation into proteins. The base sequence of most binding sites is not entirely fixed, with the different permitted spellings collectively constituting a "motif." After summarizing the underlying biological issues, we review three different models for binding site identification. Each model was developed with a different type of dataset as reference. We then present a unified model that borrows from the previous ones and integrates their main features. In our unified model, one can identify motifs and their unknown positions along a sequence. One can also fit the model to data using maximum likelihood and maximum a posteriori algorithms. These algorithms rely on recursive formulas and the maximization/minorization principle. Finally, we conclude with a prospectus of future data analyses and theoretical research.
Keywords :
DNA; biology computing; genetics; physiological models; proteins; binding sites; expectation-maximization algorithm; genes transcription; genomic sequence; maximum a posteriori algorithms; maximum likelihood algorithms; messenger RNA; permitted spellings; text segmentation; unknown positions along sequence; Bioinformatics; Biological cells; Biological system modeling; DNA; Dictionaries; Genetics; Genomics; Humans; Sequences; Statistics;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/JPROC.2002.804689
Filename :
1046958
Link To Document :
بازگشت