Sequence learning using the adaptive suffix trie algorithm

Author

Gunasinghe, Upuli ; Alahakoon, Damminda

Author_Institution

Cognitive & Connectionist Syst. Lab., Monash Univ., Clayton, VIC, Australia

fYear

2012

fDate

10-15 June 2012

Firstpage

1

Lastpage

8

Abstract

Sequences occur naturally in many domains such as biology, engineering, finance and scientific research. Since humans have the inherent ability to comprehend and utilize sequences in day to day cognitive tasks such as speech, vision and motor control; biologically inspired sequence learning techniques are used for explanatory data analysis in these domains. Identifying the common substrings which exist in sequences helps in determining the underlying structure and calculating the similarity between sequences. The suffix trie, suffix tree and suffix array are data structures which are used in many solutions to sequence based problems. However, these are static data structures and not flexible tools which can be used for sequence learning. In this paper we present the Adaptive Suffix Trie algorithm, a sequence learning algorithm which can be used for identifying substrings of different lengths and frequencies from a given set of sequences. In contrast to suffix data structures which store all suffixes, the adaptive suffix trie only captures the frequent substrings that occur in the given dataset, resulting in a less complex structure with only the relevant or useful information. We show how the algorithms´ learning parameters can be adapted for extracting substrings with the required characteristics and then demonstrate it´s application in the classification of biological sequences.

Keywords

DNA; RNA; biology computing; genetics; learning (artificial intelligence); molecular biophysics; proteins; tree data structures; adaptive suffix trie algorithm; biological sequence; cognitive task; complex structure; explanatory data analysis; sequence learning algorithm; similarity measure; static data structures; substring extraction; suffix array; suffix data structure; suffix tree; Arrays; Hebbian theory; Heuristic algorithms; Humans; Training; Vegetation; Frequent substring extraction; Sequence learning; Suffix trie;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2012 International Joint Conference on

Conference_Location

Brisbane, QLD

ISSN

2161-4393

Print_ISBN

978-1-4673-1488-6

Electronic_ISBN

2161-4393

Type

conf

DOI

10.1109/IJCNN.2012.6252671

Filename

6252671