• DocumentCode
    3394266
  • Title

    Classifying synthetic and biological DNA sequences with side effect machines

  • Author

    Ashlock, Daniel ; Warner, Elizabeth

  • Author_Institution
    Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON
  • fYear
    2008
  • fDate
    15-17 Sept. 2008
  • Firstpage
    22
  • Lastpage
    29
  • Abstract
    Finite state machines are routinely used to efficiently recognize patterns in strings. The internal state structure of the machine is typically only of peripheral interest, appearing in algorithms only when the number of states is minimized in the interests of efficiency of execution or comparison. A side effect machine saves information about the internal transitions of the state machine. This record of internal state transitions forms an induced feature set for any string run through the side effect machine. In this study the number of times a machine passes though each state is used as a numerical feature set for classification. Finite state machines are trained with an evolutionary algorithm to produce feature sets that are very easy for an unsupervised learning algorithm, k-means clustering, to learn. The system is demonstrated on synthetic and biological data. The biological data are PCR-primers classified by their success at amplification. The parameters, number of states, population size, and mutation rates are explored to characterize their effect on performance. Side effect machines are found to be effective at recognizing classes of DNA sequence data.
  • Keywords
    DNA; molecular biophysics; biological DNA sequences; evolutionary algorithm; internal state structure; k-means clustering; side effect machines; Automata; Chaos; Classification algorithms; Clustering algorithms; DNA; Evolution (biology); Evolutionary computation; Machine learning; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
  • Conference_Location
    Sun Valley, ID
  • Print_ISBN
    978-1-4244-1778-0
  • Electronic_ISBN
    978-1-4244-1779-7
  • Type

    conf

  • DOI
    10.1109/CIBCB.2008.4675755
  • Filename
    4675755