• DocumentCode
    2915040
  • Title

    An Estimation of Distribution Algorithm for Motif Discovery

  • Author

    Li, Gang ; Chan, Tak-Ming ; Leung, Kwong-Sak ; Lee, Kin-Hong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin
  • fYear
    2008
  • fDate
    1-6 June 2008
  • Firstpage
    2411
  • Lastpage
    2418
  • Abstract
    The problem of Transcription Factor Binding Sites identification or motif discovery is to identify the motif binding sites in the cis-regulatory regions of DNA sequences. The biological experiments are expensive and the problem is NP-hard computationally. We have proposed Estimation of Distribution Algorithm for Motif Discovery (EDAMD). We use Bayesian analysis to derive the fitness function to measure the posterior probability of a set of motif instances, which can be used to handle a variable number of motif instances in the sequences. EDAMD adopts a Gaussian distribution to model the distribution of the sets of motif instances, which is capable of capturing the bivariate correlation among the positions of motif instances. When a new Position Frequency Matrix (PFM) is generated from the Gaussian distribution, a new set of motif instances is identified based on the PFM via the Greedy Refinement operation. At the end of a generation, the Gaussian distribution is updated with the sets of motif instances. Since Greedy Refinement assumes a single motif instance on a sequence, a Post Processing operation based on the fitness function is used to find more motif instances after the evolution. The experiments have verified that EDAMD is comparable to or better than GAME and GALF on the real problems tested in this paper.
  • Keywords
    Bayes methods; DNA; Gaussian distribution; optimisation; Bayesian analysis; DNA sequences; Gaussian distribution; NP-hard problem; distribution algorithm; distribution algorithm estimation; greedy refinement operation; motif binding sites; motif discovery; position frequency matrix; posterior probability; transcription factor binding sites identification; Bayesian methods; Biology computing; DNA; Evolution (biology); Frequency; Gaussian distribution; Organisms; Proteins; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-1822-0
  • Electronic_ISBN
    978-1-4244-1823-7
  • Type

    conf

  • DOI
    10.1109/CEC.2008.4631120
  • Filename
    4631120