• DocumentCode
    2823563
  • Title

    Short adjacent repeat identification based on Chemical Reaction Optimization

  • Author

    Xu, Jin ; Lam, Albert Y S ; Li, Victor O K ; Li, Qiwei ; Fan, Xiaodan

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Univ. of Hong Kong, Hong Kong, China
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The analysis of short tandem repeats (STRs) in DNA sequences has become an attractive method for determining the genetic profile of an individual. Here we focus on a more general and practical issue named short adjacent repeats identification problem (SARIP), which is extended from STR by allowing short gaps between neighboring units. Presently, the best available solution to SARIP is BASARD, which uses Markov chain Monte Carlo algorithms to determine the posterior estimate. However, the computational complexity and the tendency to get stuck in a local mode lower the efficiency of BASARD and impede its wide application. In this paper, we prove that SARIP is NP-hard, and we also solve it with Chemical Reaction Optimization (CRO), a recently developed metaheuristic approach. CRO mimics the interactions of molecules in a chemical reaction and it can explore the solution space efficiently to find the optimal or near optimal solution(s). We test the CRO algorithm with both synthetic and real data, and compare its performance in mode searching with BASARD. Simulation results show that CRO enjoys dozens of times, or even a hundred times shorter computational time compared with BASARD. It is also demonstrated that CRO can obtain the global optima most of the time. Moreover, CRO is more stable in different runs, which is of great importance in practical use. Thus, CRO is by far the best method on SARIP.
  • Keywords
    Bayes methods; DNA; Markov processes; Monte Carlo methods; biology computing; computational complexity; estimation theory; genetics; optimisation; BASARD; CRO algorithm; DNA sequences; Markov chain Monte Carlo algorithms; NP-hard; SARIP; STR; adjacent repeat identification; attractive method; chemical reaction optimization; computational complexity; genetic profile; hundred times shorter computational time; metaheuristic approach; mode searching; near optimal solution; posterior estimate; real data; short adjacent repeats identification problem; short tandem repeats; solution space; synthetic data; Chemicals; DNA; Optimization; Polynomials; Silicon; Tin; Vectors; Chemical Reaction Optimization; Short adjacent repeats; maximum a posteriori;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2012 IEEE Congress on
  • Conference_Location
    Brisbane, QLD
  • Print_ISBN
    978-1-4673-1510-4
  • Electronic_ISBN
    978-1-4673-1508-1
  • Type

    conf

  • DOI
    10.1109/CEC.2012.6256614
  • Filename
    6256614