• DocumentCode
    2135559
  • Title

    Side effect machines for sequence classification

  • Author

    Ashlock, Daniel ; Warner, Elizabeth

  • Author_Institution
    Math. & Stat., Univ. of Guelph, Guelph, ON
  • fYear
    2008
  • fDate
    4-7 May 2008
  • Abstract
    Finite state machines are routinely used to efficiently recognize patterns in strings. The internal state structure of the machine is typically only of peripheral interest, appearing in algorithms only when the number of states is minimized in the interests of efficiency of execution or comparison. A side effect machine saves information about the internal transitions of the state machine. This record of internal state transitions forms an induced feature set for the string run through the machine. In this study the number of times a machine passes though each state is used as a numerical feature set for classification. Finite state machines are trained with an evolutionary algorithm to produce feature sets that are very easy for an unsupervised learning algorithm, k-means clustering, to learn. The system is demonstrated on a collection of synthetic DNA sequences with bounded randomness. The parameters, number of states, population size, and mutation rates, are explored to characterize their effect on performance. The machines achieve perfect classification on easy examples and good classification on more difficult examples. Parameter choice has a substantial impact on performance.
  • Keywords
    DNA; biology computing; evolutionary computation; finite state machines; pattern classification; string matching; DNA sequence classification; evolutionary algorithm; finite state machine; internal state structure; internal state transition; k-means clustering; numerical feature set; pattern recognition; side effect machine; unsupervised learning algorithm; Automata; Chaos; Clustering algorithms; DNA; Evolutionary computation; Fractals; Mathematics; Sequences; Statistics; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
  • Conference_Location
    Niagara Falls, ON
  • ISSN
    0840-7789
  • Print_ISBN
    978-1-4244-1642-4
  • Electronic_ISBN
    0840-7789
  • Type

    conf

  • DOI
    10.1109/CCECE.2008.4564782
  • Filename
    4564782