Title :
What Sperner Family Concept Class is Easy to Be Enumerated?
Author :
Nakamura, Atsuyoshi ; Kudo, Mineichi
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo
Abstract :
We study the problem of enumerating concepts in a Sperner family concept class using subconcept queries, which is a general problem including maximal frequent itemset mining as its instance. Though even the theoretically best known algorithm needs quasi-polynomial time to solve this problem in the worst case, there exist practically fast algorithms for this problem. This is because many instances of this problem in real world have low complexity in some measures. In this paper, we characterize the complexity of Sperner family concept class by the VC dimension of its intersection closure and its characteristic dimension, and analyze the worst case time complexity on the enumeration problem of its concepts in terms of the VC dimension. We also showed that the VC dimension of real data used in data mining is actually small by calculating the VC dimension of some real datasets using a new algorithm closely related to the introduced two measures, which does not only solve the problem but also let us know the VC dimension of the intersection closure of the target concept class.
Keywords :
computational complexity; data mining; Sperner family concept class; data mining; enumeration problem; frequent itemset mining; quasi-polynomial time; subconcept queries; time complexity; Area measurement; Boolean functions; Data mining; Frequency; Information science; Itemsets; Relational databases; Size measurement; Tree graphs; Virtual colonoscopy; Sperner family; enumeration; maxumal frequent itemset; simple hypergraph;
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3502-9
DOI :
10.1109/ICDM.2008.131