DocumentCode :
1157629
Title :
The applicability of recurrent neural networks for biological sequence analysis
Author :
Hawkins, John ; Bodén, Mikael
Author_Institution :
Sch. of Inf. Technol. & Electr. Eng., Queensland Univ., Brisbane, Qld., Australia
Volume :
2
Issue :
3
fYear :
2005
Firstpage :
243
Lastpage :
253
Abstract :
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Keywords :
biology computing; cellular biophysics; feedforward neural nets; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; recurrent neural nets; biological sequence analysis; feed forward recurrent neural networks; machine learning; protein sequences; recurrent neural networks; subcellular localization targeting peptides; Feedforward neural networks; Feeds; Hidden Markov models; Machine learning; Machine learning algorithms; Neural networks; Peptides; Proteins; Recurrent neural networks; Sequences; Index Terms- Machine learning; bias; biological sequence analysis; classifier design.; motif; neural network architecture; pattern recognition; recurrent neural network; subcellular localization; Algorithms; Amino Acid Sequence; Gene Expression Profiling; Molecular Sequence Data; Multigene Family; Neural Networks (Computer); Pattern Recognition, Automated; Proteins; Sequence Analysis, Protein; Subcellular Fractions;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2005.44
Filename :
1504688
Link To Document :
بازگشت