• DocumentCode
    3125351
  • Title

    Density Estimation Based on Mass

  • Author

    Ting, Kai Ming ; Washio, Takashi ; Wells, Jonathan R. ; Liu, Fei Tony

  • Author_Institution
    Gippsland Sch. of Inf. Technol., Monash Univ., Churchill, VIC, Australia
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    715
  • Lastpage
    724
  • Abstract
    Density estimation is the ubiquitous base modelling mechanism employed for many tasks such as clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and k-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with large data size and even a moderate number of dimensions. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method which stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We analyze the error of the new estimation (from the true density) using a bias-variance analysis. We then perform an empirical evaluation of the proposed method by replacing existing density estimators with the new one in two current density-based algorithms, namely, DBSCAN and LOF. The results show that the new density estimation method significantly improves the runtime of DBSCAN and LOF, while maintaining or improving their task-specific performances in clustering and anomaly detection, respectively. The new method empowers these algorithms, currently limited to small data size only, to process very large databases - setting a new benchmark for what density-based algorithms can achieve.
  • Keywords
    error analysis; pattern clustering; ubiquitous computing; DBSCAN; LOF; anomaly detection; bias variance analysis; data size; density based algorithm; density estimation method; empirical evaluation; error analysis; task specific performance; ubiquitous base modelling mechanism; Approximation methods; Clustering algorithms; Complexity theory; Databases; Estimation; Kernel; Noise; density estimation; density-based algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.47
  • Filename
    6137276