Title :
Distributed discovery of asynchronous partial periodic patterns in sequence data using modified periodicity transform
Author :
Hsiao, Han-Wen ; Meng-Shu Tsai ; Tsai, Jeffrey J P
Author_Institution :
Inst. of Bioinformatics, Taichung Healthcare & Manage. Univ., Taiwan
Abstract :
It has been an important task of discovering frequent subsequences as particular patterns from large sequence databases generated from a variety of applications, such as biological sequence analysis. In general, the patterns to be discovered may partially and asynchronously exist in sequences, and even contain gaps. In addition, the locations and frequencies of the patterns may be of interest for the subsequent analysis. How to enumerate candidate patterns for evaluation without exponentially increasing the computation time is another concern. The modified periodicity transform is proposed to meet the requirements mentioned above. The computation time for a synthetic sequence of length 300 K takes 4 seconds to mine all partial periodic patterns of length 5. With minor modification, it is able to handle asynchronous partial periodic patterns of arbitrary length. Note that the approach is in nature suited to distributed environments. A prototype system has been developed in Java for distributed computing. The system could be considered as a feature extractor in an early stage of sequence analysis.
Keywords :
Java; biology computing; data mining; pattern recognition; sequences; very large databases; Java; asynchronous partial periodic patterns distributed discovery; biological sequence analysis; distributed computing; distributed environments; feature extractor; modified periodicity transform; prototype system; sequence databases; Application software; Bioinformatics; Biology; Computer science; Data mining; Databases; Information technology; Medical services; Pattern analysis; Technology management;
Conference_Titel :
Multimedia Software Engineering, 2004. Proceedings. IEEE Sixth International Symposium on
Print_ISBN :
0-7695-2217-3
DOI :
10.1109/MMSE.2004.42