• DocumentCode
    2989882
  • Title

    Genetic algorithm for dimer-led and error-restricted spaced motif discovery

  • Author

    Tak-Ming Chan ; Leung-Yau Lo ; Man-Leung Wong ; Yong Liang ; Kwong-Sak Leung

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    198
  • Lastpage
    205
  • Abstract
    DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
  • Keywords
    DNA; genetic algorithms; genetics; molecular biophysics; proteins; statistical analysis; DNA motif discovery; GA-for-dimer-led and error-restricted spaced motifs; GASMEN; conserved nucleotides; deciphering protein-DNA bindings; dimer-led initialization; gene regulation; genetic algorithm; over-generic modeling; statistical analysis; DNA; Data models; Genetic algorithms; Optimization; Sociology; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/CIBCB.2013.6595409
  • Filename
    6595409