Title :
Stochastic Finite Automata for the translation of DNA to protein
Author :
Tsau-Young Lin ; Shah, Asmi H.
Author_Institution :
San Jose State Univ. (SJSU), San Jose, CA, USA
Abstract :
The use of Statistical Finite Automata (SFA) has been explored in the field of understanding the DNA sequences; many focus on local patterns, namely partial representations of DNA sequences. In this paper, we focus on global and complete representations to understand the patterns in whole DNA sequences. Obviously, DNA sequences are not random. Based on Kolmogorov complexity theory, there should be some simple Turing machines that write out such sequences; here simple means the complexity of the Turing machine is simpler than the data. The primary goal of this paper is to approximate such simple Turing machines by SFA. We use SFA, via ALERGIA algorithm (in the light granular computing), to capture and analyze the translation process (DNA to protein) based on amino acids´ chemical property viz., polarity. This, in turn, enables the understanding of interspecies DNA comparisons and the creation of phylogeny - the `tree of life´.
Keywords :
Big Data; Turing machines; biology computing; finite automata; granular computing; learning (artificial intelligence); proteins; stochastic automata; ALERGIA algorithm; DNA sequence partial representations; DNA translation; Kolmogorov complexity theory; SFA; Turing machines; amino acid chemical property; light granular computing; protein; stochastic finite automata; tree of life; Amino acids; Approximation methods; DNA; Merging; Proteins; Turing machines; DNA; bigdata; machine learning; pattern; proteins; stochastic finite automata;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004340