Title :
The Minimum Consistent Subset Cover Problem: A Minimization View of Data Mining
Author :
Gao, Byron J. ; Ester, Martin ; Hui Xiong ; Jin-Yi Cai ; Schulte, Oliver
Author_Institution :
Dept. of Comput. Sci., Texas State Univ., San Marcos, TX, USA
Abstract :
In this paper, we introduce and study the minimum consistent subset cover (MCSC) problem. Given a finite ground set X and a constraint t, find the minimum number of consistent subsets that cover X, where a subset of X is consistent if it satisfies t. The MCSC problem generalizes the traditional set covering problem and has minimum clique partition (MCP), a dual problem of graph coloring, as an instance. Many common data mining tasks in rule learning, clustering, and pattern mining can be formulated as MCSC instances. In particular, we discuss the minimum rule set (MRS) problem that minimizes model complexity of decision rules, the converse k-clustering problem that minimizes the number of clusters, and the pattern summarization problem that minimizes the number of patterns. For any of these MCSC instances, our proposed generic algorithm CAG can be directly applicable. CAG starts by constructing a maximal optimal partial solution, then performs an example-driven specific-to-general search on a dynamically maintained bipartite assignment graph to simultaneously learn a set of consistent subsets with small cardinality covering the ground set.
Keywords :
computational complexity; data mining; decision making; genetic algorithms; graph colouring; learning (artificial intelligence); minimisation; pattern clustering; set theory; MCP; MCSC problem; MRS problem; bipartite assignment graph; converse k-clustering problem; data mining; data mining tasks; decision rule complexity; finite ground set; generic algorithm CAG; graph coloring; maximal optimal partial solution; minimum clique partition; minimum consistent subset cover problem; minimum rule set problem; pattern mining; pattern summarization problem; rule learning; Clustering algorithms; Complexity theory; Data mining; Decision trees; Graph coloring; Minimization; Pattern recognition; Minimum consistent subset cover; converse k-clustering; graph coloring; minimum clique partition; minimum rule set; minimum star partition; pattern summarization; set covering;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2011.260