DocumentCode
3394266
Title
Classifying synthetic and biological DNA sequences with side effect machines
Author
Ashlock, Daniel ; Warner, Elizabeth
Author_Institution
Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON
fYear
2008
fDate
15-17 Sept. 2008
Firstpage
22
Lastpage
29
Abstract
Finite state machines are routinely used to efficiently recognize patterns in strings. The internal state structure of the machine is typically only of peripheral interest, appearing in algorithms only when the number of states is minimized in the interests of efficiency of execution or comparison. A side effect machine saves information about the internal transitions of the state machine. This record of internal state transitions forms an induced feature set for any string run through the side effect machine. In this study the number of times a machine passes though each state is used as a numerical feature set for classification. Finite state machines are trained with an evolutionary algorithm to produce feature sets that are very easy for an unsupervised learning algorithm, k-means clustering, to learn. The system is demonstrated on synthetic and biological data. The biological data are PCR-primers classified by their success at amplification. The parameters, number of states, population size, and mutation rates are explored to characterize their effect on performance. Side effect machines are found to be effective at recognizing classes of DNA sequence data.
Keywords
DNA; molecular biophysics; biological DNA sequences; evolutionary algorithm; internal state structure; k-means clustering; side effect machines; Automata; Chaos; Classification algorithms; Clustering algorithms; DNA; Evolution (biology); Evolutionary computation; Machine learning; Sequences; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location
Sun Valley, ID
Print_ISBN
978-1-4244-1778-0
Electronic_ISBN
978-1-4244-1779-7
Type
conf
DOI
10.1109/CIBCB.2008.4675755
Filename
4675755
Link To Document