• DocumentCode
    3372956
  • Title

    Learning metrics for exploratory data analysis

  • Author

    Kaski, Samuel

  • Author_Institution
    Neural Networks Res. Centre, Helsinki Univ. of Technol., Finland
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    53
  • Lastpage
    62
  • Abstract
    Visualization and cluster analysis of multivariate data is usually based on distances between samples in a data space. The distance measure is often heuristically chosen, for instance by choosing suitable features and then using a global Euclidean metric. We have developed methods that remove the arbitrariness by measuring distances only along important (local) directions. The metric is learned from auxiliary data that is paired with the primary data during the learning process. It is assumed that changes in the primary data are important or relevant if they cause changes in the auxiliary data; for example, in analysis of gene expression the auxiliary data can indicate the functional classes of the genes. The new distance measures can be used for instance in clustering and Self-Organizing Map-based data visualization. The methods have so far been applied in analysis of bankruptcy, text documents, and gene expression
  • Keywords
    data analysis; self-organising feature maps; software metrics; unsupervised learning; Self-Organizing Map-based data visualization; Unsupervised learning; auxiliary data; cluster analysis; data analysis; distance measures; exploratory data analysis; gene expression; Chromium; Data analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks for Signal Processing XI, 2001. Proceedings of the 2001 IEEE Signal Processing Society Workshop
  • Conference_Location
    North Falmouth, MA
  • ISSN
    1089-3555
  • Print_ISBN
    0-7803-7196-8
  • Type

    conf

  • DOI
    10.1109/NNSP.2001.943110
  • Filename
    943110