Title :
Discriminative Hat Matrix: A new tool for outlier identification and linear regression
Author :
Dufrenois, F. ; Noyer, J.C.
Author_Institution :
SYVIP Team, LISIC, Calais, France
fDate :
July 31 2011-Aug. 5 2011
Abstract :
The hat matrix is an important auxiliary quantity in linear regression theory for detecting errors in predictors. Traditionally, the comparison of the diagonal elements with a calibration point serves as decision rule for separating a dominant linear population from outliers. However, several problems exist: first, the calibration point is not well defined because no exact statistical distribution (asymptotic form) of the hat matrix diagonal exists [1]. Secondly, being based on the standard covariance matrix, this outlying measure looses its efficiency when the rate of “atypical” observations becomes large [2][3]. In this paper, we present a discriminative version of the hat matrix (DHM) which transposes this classification problem into a subspace clustering problem. We propose a linear discriminant analysis based criterion directly built on the properties of the hat matrix and we show that its maximization leads to search an optimal projection subspace and an optimal indicator matrix. We also show that the statistic of the hat matrix diagonal “projected” on this optimal subspace has an exact X2 behaviour and thus makes it possible to identify outliers by way of hyptothesis testing. Synthetic data sets are used to study the performance both in terms of regression and classification of the proposed approach. We also illustrate its potential application to motion segmentation in image sequences.
Keywords :
covariance matrices; pattern classification; pattern clustering; regression analysis; atypical observations; classification problem; covariance matrix; discriminative hat matrix; dominant linear population; hyptothesis testing; image sequences; linear discriminant analysis; linear regression theory; motion segmentation; optimal indicator matrix; optimal projection subspace; outlier identification; predictor error detection; subspace clustering problem; Covariance matrix; Distributed databases; Eigenvalues and eigenfunctions; Linear regression; Matrix decomposition; Robustness; Vectors;
Conference_Titel :
Neural Networks (IJCNN), The 2011 International Joint Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
978-1-4244-9635-8
DOI :
10.1109/IJCNN.2011.6033300