DocumentCode
863652
Title
Genomewide motif identification using a dictionary model
Author
Sabatti, Chiara ; Lange, Kenneth
Author_Institution
Human Genetics & Stat. Departments, California Univ., Los Angeles, CA, USA
Volume
90
Issue
11
fYear
2002
fDate
11/1/2002 12:00:00 AM
Firstpage
1803
Lastpage
1810
Abstract
This paper surveys and extends models and algorithms for identifying binding sites in noncoding regions of DNA. Binding sites control the transcription of genes into messenger RNA in preparation for translation into proteins. The base sequence of most binding sites is not entirely fixed, with the different permitted spellings collectively constituting a "motif." After summarizing the underlying biological issues, we review three different models for binding site identification. Each model was developed with a different type of dataset as reference. We then present a unified model that borrows from the previous ones and integrates their main features. In our unified model, one can identify motifs and their unknown positions along a sequence. One can also fit the model to data using maximum likelihood and maximum a posteriori algorithms. These algorithms rely on recursive formulas and the maximization/minorization principle. Finally, we conclude with a prospectus of future data analyses and theoretical research.
Keywords
DNA; biology computing; genetics; physiological models; proteins; binding sites; expectation-maximization algorithm; genes transcription; genomic sequence; maximum a posteriori algorithms; maximum likelihood algorithms; messenger RNA; permitted spellings; text segmentation; unknown positions along sequence; Bioinformatics; Biological cells; Biological system modeling; DNA; Dictionaries; Genetics; Genomics; Humans; Sequences; Statistics;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/JPROC.2002.804689
Filename
1046958
Link To Document