Title :
Mining frequent pattern within a genetic sequence using unique pattern indexing and mapping techniques
Author :
Mutakabbir, Kazi Mahbub ; Mahin, Shah S. ; Hasan, M. Anwar
Author_Institution :
Dept. of CSE, Islamic Univ. of Technol. (IUT), Gazipur, Bangladesh
Abstract :
Searching for the frequent pattern within a specific genetic sequence has become a much needed task in the bioinformatics sector. Most recent works are based on Apriori algorithm, GSP, MacroVspan etc. techniques. However, frequent pattern mining can be made more efficient. In this paper, we propose two algorithms. The first one indexes the unique sequences of length four using an integer value. The second algorithm finds the frequency of the frequent patterns of various lengths by searching through the integer values instead of the patterns themselves. All this is done highly efficiently by the use of mapping techniques e.g. HashMap. Due to its highly frugal nature, the proposed algorithm can reduce typical memory usage by 37.5% at the very minimum.
Keywords :
bioinformatics; data mining; genetic algorithms; indexing; pattern classification; Apriori algorithm; GSP; MacroVspan; bioinformatics sector; frequent pattern mining; genetic sequence; integer value; integer values; mapping techniques; mining frequent pattern; unique pattern indexing; Algorithm design and analysis; Bioinformatics; DNA; Data mining; Indexing; Informatics; ASCII Byte-encoding; HashMap; Pattern Indexing; frequent pattern;
Conference_Titel :
Informatics, Electronics & Vision (ICIEV), 2014 International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4799-5179-6
DOI :
10.1109/ICIEV.2014.6850729