• DocumentCode
    2323801
  • Title

    Generic spaced DNA motif discovery using Genetic Algorithm

  • Author

    Chan, Tak-Ming ; Leung, Kwong-Sak ; Lee, Kin-Hong ; Lio, Pietro

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    DNA motif discovery is an important problem for deciphering gene regulation. Motifs usually contain gaps (spaced) and are more complex than contiguously conserved (monad) patterns. Existing algorithms mostly address monad motifs, and methods for spaced motifs impose various constraints on gaps, which may affect the discovery of complex motifs. In this paper, we propose Genetic Algorithm (GA) for Spaced Motifs Elicitation on Nucleotides (GASMEN), which searches from a wide range of possible widths (4-25) and relaxes substantial constraints. GASMEN employs submotif indexing to partition the search space into smaller sub-space for GA to easier reach optimality. Multiple-motif control is employed and probabilistic refinements are proposed to improve motif quality respectively. The preliminary results on real spaced motifs demonstrate that GASMEN is promising to find more accurate motifs and optimal widths, compared with the state-of-the-art method, SPACE. GASMEN is also capable of finding monad motifs, outperforming both Weeder and SPACE on most of the 8 real datasets.
  • Keywords
    biology computing; genetic algorithms; genetics; search problems; gene regulation; generic spaced DNA motif discovery; genetic algorithm; multiple-motif control; nucleotides; search space; spaced motifs elicitation; submotif indexing; Aerospace electronics; DNA; Hamming distance; Probabilistic logic; Pulse width modulation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2010 IEEE Congress on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-1-4244-6909-3
  • Type

    conf

  • DOI
    10.1109/CEC.2010.5585924
  • Filename
    5585924