Title :
Learning local languages and their application to DNA sequence analysis
Author :
Yokomori, Takashi ; Kobayashi, Satoshi
Author_Institution :
Dept. of Math., Waseda Univ., Tokyo, Japan
fDate :
10/1/1998 12:00:00 AM
Abstract :
This paper presents an efficient algorithm for learning in the limit a special type of regular languages, called strictly locally testable languages from positive data, and its application to identifying the protein α-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns its deterministic finite state automaton in the limit from only positive data. This provides one with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein α-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95% correct identification for positive data, and 96% for negative data
Keywords :
DNA; biology computing; deterministic automata; finite automata; formal languages; learning systems; pattern recognition; proteins; α-chain region; DNA sequence analysis; amino acid; deterministic automata; finite state automaton; hemoglobin; learning algorithm; linear time algorithm; machine learning; pattern recognition; protein; strictly locally testable languages; Algorithm design and analysis; Amino acids; DNA; Formal languages; Learning automata; Polynomials; Proteins; Sequences; Splicing; Testing;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on