• DocumentCode
    2709902
  • Title

    Finding Good Itemsets by Packing Data

  • Author

    Tatti, Nikolaj ; Vreeken, Jilles

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Helsinki Univ. of Technol., Helsinki
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    588
  • Lastpage
    597
  • Abstract
    The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.
  • Keywords
    data mining; decision trees; complex interactions; compression technique; decision trees; itemsets; packing data; Association rules; Computer science; Data mining; Decision trees; Encoding; Explosions; Frequency; Itemsets; Length measurement; Proposals;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.39
  • Filename
    4781154