• DocumentCode
    3165364
  • Title

    Depth-Based Novelty Detection and Its Application to Taxonomic Research

  • Author

    Chen, Yixin ; Bart, H.L. ; Dang, Xin ; Peng, Hanxiang

  • Author_Institution
    Univ. of Mississippi, Hattiesburg
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    113
  • Lastpage
    122
  • Abstract
    It is estimated that less than 10 percent of the world\´s species have been described, yet species are being lost daily due to human destruction of natural habitats. The job of describing the earth\´s remaining species is exacerbated by the shrinking number of practicing taxonomists and the very slow pace of traditional taxonomic research. In this article, we tackle, from a novelty detection perspective, one of the most important and challenging research objectives in taxonomy - new species identification. We propose a unique and efficient novelty detection framework based on statistical depth functions. Statistical depth functions provide from the "deepest" point a "center-outward ordering" of multidimensional data. In this sense, they can detect observations that appear extreme relative to the rest of the observations, i.e., novelty. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. We propose a novel statistical depth, the kernelized spatial depth (KSD) that generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. Observations with depth values less than a threshold are declared as novel. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. We give an upper bound on the false alarm probability of a depth-based detector, which can be used to determine the threshold. Experimental study demonstrates its excellent potential in new species discovery.
  • Keywords
    biology computing; data mining; learning (artificial intelligence); probability; zoology; center-outward ordering; data mining; depth-based detector; depth-based novelty detection; false alarm probability; kernelized spatial depth; machine learning; mathematical tractability; multidimensional data; new species identification; statistical depth functions; taxonomic research; Character recognition; Data mining; Detectors; Earth; Humans; Kernel; Machine learning; Support vector machines; Taxonomy; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3018-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2007.10
  • Filename
    4470235