• DocumentCode
    2682876
  • Title

    Reconstructing latent periods in genome sequences with insertions and deletions

  • Author

    Arora, Raman ; Dewey, Colin ; Sethares, William A.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Wisconsin-Madison, Madison, WI, USA
  • fYear
    2009
  • fDate
    17-21 May 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Tandem and latent repeats in genome sequences provide insight into its various structural and functional roles. Such regions in genome sequences are modeled as cyclostationary processes, generated by a collection of information sources in a cyclic manner. The maximum likelihood (ML) estimates can be easily generated for the cyclostationary profiles and for the statistical period of such subsequences. However, in the presence of insertions and deletions, the ML estimators suffer greatly in their ability to accurately identify the periods. This paper extends the cyclic model to a profile hidden Markov model (PHMM) to account for insertions and deletions. An iterative algorithm is developed to learn parameters of the PHMM and Viterbi algorithm is employed to learn the most likely path through the state space. This reconstructs likely insertions and deletions in the sequence and results in better estimates of the statistical period and cyclostationary profiles than the ML approach. Experimental results are provided with simulated sequences as well as with chromosome 1 sequence from human genome.
  • Keywords
    genomics; hidden Markov models; maximum likelihood estimation; molecular biophysics; state-space methods; Viterbi algorithm; cyclostationary process; genome sequence; latent period reconstruction; maximum likelihood estimation method; profile hidden Markov model; state space method; Bioinformatics; Biomedical engineering; DNA; Fourier transforms; Genomics; Hidden Markov models; Iterative algorithms; Maximum likelihood estimation; Random variables; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genomic Signal Processing and Statistics, 2009. GENSIPS 2009. IEEE International Workshop on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4244-4761-9
  • Electronic_ISBN
    978-1-4244-4762-6
  • Type

    conf

  • DOI
    10.1109/GENSIPS.2009.5174377
  • Filename
    5174377