• DocumentCode
    586380
  • Title

    Visualizing high dimensional datasets using parallel coordinates: Application to gene prioritization

  • Author

    Boogaerts, T. ; Tranchevent, L. ; Pavlopoulos, Georgios A. ; Aerts, Jan ; Vandewalle, Joos

  • Author_Institution
    Leuven Future Health Dept., Katholieke Univ. Leuven, Leuven, Belgium
  • fYear
    2012
  • fDate
    11-13 Nov. 2012
  • Firstpage
    52
  • Lastpage
    57
  • Abstract
    In this paper, we introduce a visualization tool for interactive and efficient exploration of high dimensional data using parallel coordinates. An algorithm is developed to find an optimal permutation of dimensions, which allows the data miner to immediately see the most important features or irregularities in the dataset. This is implemented as a genetic algorithm based on the travelling salesman problem using maximal correlation as fitness. Other features of the tool include selection operators to group the data such as selection by intersection or by angle, orthogonal and density plots complementing the parallel coordinates plot, manual arrangement of permutation order of the dimensions, possibility to show all plots necessary to see all dimensional relations and displaying a certain number of standard deviations for each dimension separately. The tool is applied to multiple gene prioritization cases in search of genes that are relevant to certain genetic disorders. The used datasets are obtained with the MerKator and Endeavour tools and include a Breast cancer, Cataract, Charcoth-Marie-Tooth and Cardiomyopathy dataset, as well as a dataset relating 29 diseases with 22206 genes. Our tool, manual and data can be downloaded from http://www.toomas.be/parcoord/.
  • Keywords
    cancer; data mining; data visualisation; genetic algorithms; genetics; medical computing; medical disorders; travelling salesman problems; Charcoth-Marie-Tooth; Endeavour tool; MerKator tool; breast cancer; cardiomyopathy dataset; cataract; data grouping; data miner; diseases; fitness; gene prioritization; genetic algorithm; genetic disorders; high-dimensional dataset visualization tool; maximal correlation; optimal dimension permutation; parallel coordinate plot; selection operators; travelling salesman problem; Breast cancer; Correlation; Data visualization; Diseases; Genetic algorithms; Genetics; Proteins; data visualization; gene prioritization; genetic algorithm; parallel coordinates;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
  • Conference_Location
    Larnaca
  • Print_ISBN
    978-1-4673-4357-2
  • Type

    conf

  • DOI
    10.1109/BIBE.2012.6399706
  • Filename
    6399706