Abstract :
Local pattern mining methods are fragmented along two dimensions: the pattern syntax, and the data types on which they are applicable. Pattern syntaxes considered in the literature include subgroups, n-sets, itemsets, and many more; common data types include binary, categorical, and real-valued. Recent research on pattern mining in relational databases has shown how the aforementioned pattern syntaxes can be unified in a single framework. However, a unified understanding of how to deal with various data types is lacking, certainly for more complexly structured types such as time of day (which is circular), geographical location, terms from a taxonomy, etc. In this paper, we introduce a generic approach for mining interesting local patterns in (relational) data involving such structured data types as attributes. Importantly, we show how this can be done in a generic manner, by modelling the structure within a set of attribute values as a partial order. We then derive a measure of subjective interestingness of such patterns using Information Theory, and propose an algorithm for effectively enumerating all patterns of this syntax. Through empirical evaluation, we found that (a) the new interestingness derivation is relevant and cannot be approximated using existing tools, (b) the new tool, P-N-RMiner, finds patterns that are substantially more informative, and (c) the new enumeration algorithm is considerably faster.
Keywords :
"Data mining","Syntactics","Relational databases","Taxonomy","Itemsets","Approximation algorithms"