• DocumentCode
    2370540
  • Title

    Tractable group detection on large link data sets

  • Author

    Kubica, Jeremy ; Moore, Andrew ; Schneider, Jeff

  • Author_Institution
    Robotics Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    573
  • Lastpage
    576
  • Abstract
    Discovering underlying structure from co-occurrence data is an important task in a variety of fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. Previously Kubica et al. presented the group detection algorithm (GDA) - an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, potentially making it infeasible for many large data sets. To this end, we present k-groups - an algorithm that uses an approach similar to that of k-means to significantly accelerate the discovery of groups while retaining GDA´s probabilistic model. We compare the performance of GDA and k-groups on a variety of data, showing that k-groups´ sacrifice in solution quality is significantly offset by its increase in speed.
  • Keywords
    belief networks; data mining; learning (artificial intelligence); maximum likelihood estimation; probability; very large databases; co-occurrence data; group detection algorithm; k-group algorithm; large link data set; probabilistic generative model; Data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250980
  • Filename
    1250980