• DocumentCode
    3124146
  • Title

    Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery

  • Author

    Duivesteijn, Wouter ; Knobbe, Arno

  • Author_Institution
    LIACS, Leiden Univ., Leiden, Netherlands
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    151
  • Lastpage
    160
  • Abstract
    Subgroup discovery suffers from the multiple comparisons problem: we search through a large space, hence whenever we report a set of discoveries, this set will generally contain false discoveries. We propose a method to compare subgroups found through subgroup discovery with a statistical model we build for these false discoveries. We determine how much the subgroups we find deviate from the model, and hence statistically validate the found subgroups. Furthermore we propose to use this subgroup validation to objectively compare quality measures used in subgroup discovery, by determining how much the top subgroups we find with each measure deviate from the statistical model generated with that measure. We thus aim to determine how good individual measures are in selecting significant findings. We invoke our method to experimentally compare popular quality measures in several subgroup discovery settings.
  • Keywords
    data mining; statistical analysis; false discoveries; quality measures; statistical model; statistical pattern validation; subgroup discovery; Association rules; Complexity theory; Histograms; Search problems; Silicon; Size measurement; Statistical validation; subgroup discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.65
  • Filename
    6137219