DocumentCode :
2649626
Title :
Itemset Mining in Noisy Contexts: A Hybrid Approach
Author :
Mouhoubi, Karima ; Létocart, Lucas ; Rouveirol, Céline
Author_Institution :
LIPN, Univ. Paris 13, Villetaneuse, France
fYear :
2011
fDate :
7-9 Nov. 2011
Firstpage :
33
Lastpage :
40
Abstract :
A general task in data mining consists in finding all rectangles of 1 in a boolean matrix in which the order of the rows and columns is not important. However, most algorithms which have been developed to solve this task are unable to be adapted to real data that may contain noise. The effect of the noise is to shatter relevant item sets into a set of small irrelevant item sets, yielding an explosion in the number of resulting item sets. Recent algorithms that have been proposed to address this problem suffer from various limitations such as the large number of results, the execution time which remains very high and the inability to discover overlapping patterns. In this work, we propose a new heuristic approach based on a graph algorithm for the efficient extraction of item set patterns in noisy binary contexts. This method is based on maximal flow/minimal cut algorithms to find dense sub graphs of 1 in the graph associated to the boolean data matrix. To evaluate our approach, various experiments have been performed on both synthetic data and real datasets from bioinformatic applications. We have compared our results on various synthetic datasets and a gene-expression data with various methods and demonstrate that i) our method is quite efficient ii) the patterns extracted by our algorithm have a better quality than the other methods.
Keywords :
Boolean algebra; bioinformatics; data mining; graph theory; matrix algebra; Boolean data matrix; data mining; gene-expression data; graph algorithm; item set pattern extraction; itemset mining; maximal flow cut algorithm; minimal cut algorithm; noisy binary contexts; synthetic datasets; Bioinformatics; Bipartite graph; Context; Data mining; Itemsets; Noise; Noise measurement; Data Mining; dense subgraphs; maximal flow/minimal cut; noisy datasets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
ISSN :
1082-3409
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2011.14
Filename :
6103303
Link To Document :
بازگشت