DocumentCode
3156341
Title
Generating closed frequent gensets under constraints based on FP-Tree structure
Author
Trabelsi, C. ; Latiri, C. ; Ghedira, K.
Author_Institution
Res. group SOIE, Tunisian High Sch. of Manage., Tunis
Volume
2
fYear
2006
fDate
4-6 Oct. 2006
Firstpage
1526
Lastpage
1531
Abstract
The mechanism of gene regulation is of great interest for biologists, especially in the genomic field. One part of mechanisms controlling the genes expression is provided by the transcription factors, which are proteins that can either repress or stimulate the transcription of a gene. In this paper, we propose a new data mining algorithm, based on Boolean contexts, in order to extract a priori relevant frequent closed gensets, i.e., sets of tissues and associated sets of genes and transcription factors which are useful for the biologist. The key feature of our algorithm is a better compromise between the size of the search space and the conveyed discovered knowledge in bioinformatics. For this, the proposed algorithm, called MC 2G for mining constraint closed gensets, uses the frequent pattern tree (FP-Tree) structure, which is an extended prefix-tree structure, to prune the search space. Moreover MC2G enables to define statistical and syntaxic constraints on the desired frequent closed gensets and uses them during the extraction process. Experimental comparisons with other algorithms are achieved on real world datasets
Keywords
biology computing; data mining; genetics; proteins; statistics; tree searching; Boolean contexts; bioinformatics; closed frequent gensets; constraint closed genset mining; data mining; extended prefix-tree structure; frequent pattern-tree structure; gene regulation; pattern discovery; proteins; search space; statistical constraints; syntax constraints; Association rules; Bioinformatics; Biological information theory; Biology computing; Data mining; Drugs; Frequency; Genetic expression; Proteins; Systems engineering and theory; Closed frequent genset; Constraint-based data mining; FP-Tree structure; Gene expression; Pattern discovery; Transcription factor;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Engineering in Systems Applications, IMACS Multiconference on
Conference_Location
Beijing
Print_ISBN
7-302-13922-9
Electronic_ISBN
7-900718-14-1
Type
conf
DOI
10.1109/CESA.2006.4281879
Filename
4281879
Link To Document