• DocumentCode
    863652
  • Title

    Genomewide motif identification using a dictionary model

  • Author

    Sabatti, Chiara ; Lange, Kenneth

  • Author_Institution
    Human Genetics & Stat. Departments, California Univ., Los Angeles, CA, USA
  • Volume
    90
  • Issue
    11
  • fYear
    2002
  • fDate
    11/1/2002 12:00:00 AM
  • Firstpage
    1803
  • Lastpage
    1810
  • Abstract
    This paper surveys and extends models and algorithms for identifying binding sites in noncoding regions of DNA. Binding sites control the transcription of genes into messenger RNA in preparation for translation into proteins. The base sequence of most binding sites is not entirely fixed, with the different permitted spellings collectively constituting a "motif." After summarizing the underlying biological issues, we review three different models for binding site identification. Each model was developed with a different type of dataset as reference. We then present a unified model that borrows from the previous ones and integrates their main features. In our unified model, one can identify motifs and their unknown positions along a sequence. One can also fit the model to data using maximum likelihood and maximum a posteriori algorithms. These algorithms rely on recursive formulas and the maximization/minorization principle. Finally, we conclude with a prospectus of future data analyses and theoretical research.
  • Keywords
    DNA; biology computing; genetics; physiological models; proteins; binding sites; expectation-maximization algorithm; genes transcription; genomic sequence; maximum a posteriori algorithms; maximum likelihood algorithms; messenger RNA; permitted spellings; text segmentation; unknown positions along sequence; Bioinformatics; Biological cells; Biological system modeling; DNA; Dictionaries; Genetics; Genomics; Humans; Sequences; Statistics;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/JPROC.2002.804689
  • Filename
    1046958