Title :
Mining Loosely Structured Motifs from Biological Data
Author :
Fassetti, Fabio ; Greco, Gianluigi ; Terracina, Giorgio
Author_Institution :
Univ. of Calabria, Cosenza
Abstract :
The discovery of information encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually encoded in patterns frequently occurring in the sequences, also called motifs. In fact, motif discovery has received much attention in the literature, and several algorithms have already been proposed, which are specifically tailored to deal with motifs exhibiting some kinds of "regular structure". Motivated by biological observations, this paper focuses on the mining of loosely structured motifs, i.e., of more general kinds of motif where several "exceptions" may be tolerated in pattern repetitions. To this end, an algorithm exploiting data structures conceived to efficiently handle pattern variabilities is presented and analyzed. Furthermore, a randomized variant with linear time and space complexity is introduced, and a theoretical guarantee on its performances is proven. Both algorithms have been implemented and tested on real data sets. Despite the ability of mining very complex kinds of pattern, performance results evidence a genome-wide applicability of the proposed techniques.
Keywords :
biology computing; computational complexity; data mining; data structures; genetics; pattern recognition; biological data mining; biological observations; biological sequences; data structures; deciphering biological mechanisms; genetic disease identification; genome-wide applicability; information discovery; linear time complexity; loosely structured motifs mining; pattern mining; pattern repetitions; randomized variant; space complexity; Bioinformatics; Biological information theory; Biological system modeling; DNA; Databases; Genetics; Genomics; Organisms; Proteins; Sequences; Bioinformatics (genome or protein) databases; Data mining; Mining methods and algorithms;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2008.65