• DocumentCode
    1785080
  • Title

    LA2SNE: A novel stochastic neighbor embedding approach for microbiome data visualization

  • Author

    Weiwei Xu ; Rong Xie ; Xingpeng Jiang ; Xiaohua Hu

  • Author_Institution
    Int. Sch. of Software, Wuhan Univ., Wuhan, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    31
  • Lastpage
    37
  • Abstract
    Visualization of large-scale data is the first step to acquire preliminary insight into complex biological data. In recent years, many statistical visualization methods have been designed to support data visualization. Stochastic Neighbor Embedding (SNE) is one of these efficient approaches, which uses the probabilistic distance to model differences among data points within the data space. SNE and its variants (e.g. t-SNE) have demonstrated superiority over other methods in exploring complex data. By using these methods, however, similar data points tend to group together, which prevents the identification of subtle differences. A good visualization method should not only present clear data structure, but distinguish subtle differences. In this paper, we propose a novel extension of SNE. The approach has three innovations: (1) we replaced the Gaussian distribution in SNE with a Laplacian distribution on both high dimensional space and low dimensional space. The Laplace distribution has wider tails than the Gaussian distribution, and thus it can be used to overcome the over-crowding problem noted in SNE and its variants. (2) We used a symmetric modification of Kullback-Leibler divergence measure as the objective function which provides more flexibility to the model. (3) We add a graph Laplacian regularization terms to the objective function which have an advantage to preserve the manifold structure among data points. Experiments on simulation data and human microbiome data indicate that it has better visualization performance than other methods in distinguishing crowding data points.
  • Keywords
    Gaussian distribution; Laplace equations; bioinformatics; data structures; data visualisation; genomics; microorganisms; statistical databases; Gaussian distribution; Kullback-Leibler divergence measure; LA2SNE; Laplacian distribution; complex biological data; complex data; crowding data points; data space; data structure; graph Laplacian regularization terms; high-dimensional space; human microbiome data; large-scale data visualization; low-dimensional space; manifold structure; microbiome data visualization; novel stochastic neighbor embedding approach; probabilistic distance; statistical visualization methods; Cost function; Data visualization; Laplace equations; Linear programming; Principal component analysis; Probabilistic logic; Stochastic processes; Data visualization; Dimension reduction; Laplacian distribution; Laplacian regularization; Microbiome;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999294
  • Filename
    6999294