DocumentCode :
3394266
Title :
Classifying synthetic and biological DNA sequences with side effect machines
Author :
Ashlock, Daniel ; Warner, Elizabeth
Author_Institution :
Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON
fYear :
2008
fDate :
15-17 Sept. 2008
Firstpage :
22
Lastpage :
29
Abstract :
Finite state machines are routinely used to efficiently recognize patterns in strings. The internal state structure of the machine is typically only of peripheral interest, appearing in algorithms only when the number of states is minimized in the interests of efficiency of execution or comparison. A side effect machine saves information about the internal transitions of the state machine. This record of internal state transitions forms an induced feature set for any string run through the side effect machine. In this study the number of times a machine passes though each state is used as a numerical feature set for classification. Finite state machines are trained with an evolutionary algorithm to produce feature sets that are very easy for an unsupervised learning algorithm, k-means clustering, to learn. The system is demonstrated on synthetic and biological data. The biological data are PCR-primers classified by their success at amplification. The parameters, number of states, population size, and mutation rates are explored to characterize their effect on performance. Side effect machines are found to be effective at recognizing classes of DNA sequence data.
Keywords :
DNA; molecular biophysics; biological DNA sequences; evolutionary algorithm; internal state structure; k-means clustering; side effect machines; Automata; Chaos; Classification algorithms; Clustering algorithms; DNA; Evolution (biology); Evolutionary computation; Machine learning; Sequences; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location :
Sun Valley, ID
Print_ISBN :
978-1-4244-1778-0
Electronic_ISBN :
978-1-4244-1779-7
Type :
conf
DOI :
10.1109/CIBCB.2008.4675755
Filename :
4675755
Link To Document :
بازگشت