• DocumentCode
    2886237
  • Title

    Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

  • Author

    Nurunnabi, Abdul ; West, Geoff

  • Author_Institution
    Dept. of Spatial Sci., Curtin Univ., Perth, WA, Australia
  • fYear
    2012
  • fDate
    10-10 Dec. 2012
  • Firstpage
    643
  • Lastpage
    652
  • Abstract
    Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets.
  • Keywords
    data mining; database management systems; pattern classification; regression analysis; data analysis; data mining research community; knowledge discovery; logistic regression; outlier detection; predictive classification; predictive modeling; reliable knowledge; Algorithm design and analysis; Classification algorithms; Data mining; Data models; Logistics; Reliability; data mining; high leverge point; influential observation; knowledge discovery; outlier; pattern recognition; regression; reliability; statistical computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • Print_ISBN
    978-1-4673-5164-5
  • Type

    conf

  • DOI
    10.1109/ICDMW.2012.107
  • Filename
    6406412