• DocumentCode
    2163276
  • Title

    How efficient is estimation with missing data?

  • Author

    Karadogan, Seliz G. ; Marchegiani, Letizia ; Hansen, Lars Kai ; Larsen, Jan

  • Author_Institution
    DTU Inf., Tech. Univ. of Denmark, Lyngby, Denmark
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    2260
  • Lastpage
    2263
  • Abstract
    In this paper, we present a new evaluation approach for missing data techniques (MDTs) where the efficiency of those are investigated using listwise deletion method as reference. We experiment on classification problems and calculate misclassification rates (MR) for different missing data percentages (MDP) using a missing completely at random (MCAR) scheme. We compare three MDTs: pairwise deletion (PW), mean imputation (MI) and a maximum likelihood method that we call complete expectation maximization (CEM). We use a synthetic dataset, the Iris dataset and the Pima Indians Diabetes dataset. We train a Gaussian mixture model (GMM). We test the trained GMM for two cases, in which test dataset is missing or complete. The results show that CEM is the most efficient method in both cases while MI is the worst performer of the three. PW and CEM proves to be more stable, in particular for higher MDP values than MI.
  • Keywords
    Gaussian processes; data handling; expectation-maximisation algorithm; Gaussian mixture model; MCAR scheme; Pima Indians diabetes dataset; classification problem; complete expectation maximization; iris dataset; maximum likelihood method; mean imputation; misclassification rates; missing completely at random scheme; missing data percentage; missing data technique; pairwise deletion; trained GMM; Covariance matrix; Data models; Diabetes; Iris; Maximum likelihood estimation; Robustness; Machine learning; missing data techniques; supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5946932
  • Filename
    5946932