• DocumentCode
    3519535
  • Title

    Biological Data Outlier Detection Based on Kullback-Leibler Divergence

  • Author

    Oh, Jung Hun ; Gao, Jean ; Rosenblatt, Kevin

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Texas, Arlington, TX
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    249
  • Lastpage
    254
  • Abstract
    Outlier detection is imperative in biomedical data analysis to achieve reliable knowledge discovery. In this paper, a new outlier detection method based on Kullback-Leibler (KL) divergence is presented. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. To handle the non-linearity during the KL divergence calculation and to tackle with the singularity problem due to small sample size, we map the original data into a higher feature space and apply kernel functions without resorting to a mapping function. A sample possessing the largest KL divergence is detected as an outlier. The proposed method is tested with one synthetic data, two public gene expression data sets, and our own mass spectrometry data generated for prostate cancer study.
  • Keywords
    biology computing; data mining; medical computing; regression analysis; KL divergence calculation; KL divergence concept; Kullback-Leibler divergence; biological data outlier detection; biological sample outlier detection; biomedical data analysis; distribution distance measure; higher feature space mapping; kernel functions; knowledge discovery; mass spectrometry data; nearest neighbors; prostate cancer study; public gene expression data sets; singularity problem; Bioinformatics; Biology; Clustering algorithms; Data analysis; Intrusion detection; Kernel; Medical diagnostic imaging; Nearest neighbor searches; Object detection; Support vector machines; mass spectrometry; outllier detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-0-7695-3452-7
  • Type

    conf

  • DOI
    10.1109/BIBM.2008.76
  • Filename
    4684899