Discovering maximal subsequence patterns in sequence database

Author

Singhal, Leena ; Jain, Neha ; Gupta, Geeta ; Gupta, Neelima

Author_Institution

Dept. of Comput. Sci., Univ. of Delhi, Delhi, India

fYear

2009

fDate

14-15 Dec. 2009

Firstpage

1

Lastpage

5

Abstract

Mining sequential patterns in biological data has attracted a great deal of attention in the last couple of years. Biologists are interested in finding the frequent orderly arrangement of motifs that may be responsible for similar expression of a group of genes. The size of the output space can be greatly reduced if only the maximal frequent patterns are reported. In this paper we present maximal PrefixSpan algorithm which reports maximal frequent patterns in the sequence database. Experimental results on synthetic data shows that the size of the output space is greatly reduced when only maximal frequent patterns are reported.

Keywords

biology computing; data mining; biological data; maximal PrefixSpan algorithm; maximal frequent pattern; maximal subsequence pattern discovery; sequence database; sequential pattern mining; Computer science; Costs; Data mining; Databases; Proteins; Sampling methods; Testing; Maximal frequent sequences; Sequence mining; TFBS;

fLanguage

English

Publisher

ieee

Conference_Titel

Methods and Models in Computer Science, 2009. ICM2CS 2009. Proceeding of International Conference on

Conference_Location

Delhi

Print_ISBN

978-1-4244-5051-0

Type

conf

DOI

10.1109/ICM2CS.2009.5397958

Filename

5397958