• DocumentCode
    81292
  • Title

    Universal Outlier Hypothesis Testing

  • Author

    Yun Li ; Nitinawarat, S. ; Veeravalli, Venugopal V.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Illinois, Urbana, IL, USA
  • Volume
    60
  • Issue
    7
  • fYear
    2014
  • fDate
    Jul-14
  • Firstpage
    4066
  • Lastpage
    4082
  • Abstract
    Outlier hypothesis testing is studied in a universal setting. Multiple sequences of observations are collected, a small subset of which are outliers. A sequence is considered an outlier if the observations in that sequence are distributed according to an outlier distribution, distinct from the typical distribution governing the observations in all the other sequences. Nothing is known about the outlier and typical distributions except that they are distinct and have full supports. The goal is to design a universal test to best discern the outlier sequence(s). For models with exactly one outlier sequence, the generalized likelihood test is shown to be universally exponentially consistent. A single-letter characterization of the error exponent achievable by the test is derived, and it is shown that the test achieves the optimal error exponent asymptotically as the number of sequences approaches infinity. When the null hypothesis with no outlier is included, a modification of the generalized likelihood test is shown to achieve the same error exponent under each non-null hypothesis, and also consistency under the null hypothesis. Then, models with more than one outlier are studied in the following settings. For the setting with a known number of distinctly distributed outliers, the achievable error exponent of the generalized likelihood test is characterized. The limiting error exponent achieved by such a test is characterized, and the test is shown to be asymptotically exponentially consistent. For the setting with an unknown number of identically distributed outliers, a modification of the generalized likelihood test is shown to achieve a positive error exponent under each non-null hypothesis, and also consistency under the null hypothesis. When the outlier sequences can be distinctly distributed (with their total number being unknown), it is shown that a universally exponentially consistent test cannot exist, even when the typical distribution is known and the null - ypothesis is excluded.
  • Keywords
    statistical testing; error exponent; generalized likelihood test; limiting error; null hypothesis; outlier distribution; single-letter characterization; universal outlier hypothesis testing; universal setting; Decoding; Encoding; Error probability; Manganese; Measurement; Testing; Training data; Anomaly detection; big data; classification; consistency; exponential consistency; fraud detection; generalized likelihood test; outlier detection;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2014.2317691
  • Filename
    6799184