DocumentCode
3372956
Title
Learning metrics for exploratory data analysis
Author
Kaski, Samuel
Author_Institution
Neural Networks Res. Centre, Helsinki Univ. of Technol., Finland
fYear
2001
fDate
2001
Firstpage
53
Lastpage
62
Abstract
Visualization and cluster analysis of multivariate data is usually based on distances between samples in a data space. The distance measure is often heuristically chosen, for instance by choosing suitable features and then using a global Euclidean metric. We have developed methods that remove the arbitrariness by measuring distances only along important (local) directions. The metric is learned from auxiliary data that is paired with the primary data during the learning process. It is assumed that changes in the primary data are important or relevant if they cause changes in the auxiliary data; for example, in analysis of gene expression the auxiliary data can indicate the functional classes of the genes. The new distance measures can be used for instance in clustering and Self-Organizing Map-based data visualization. The methods have so far been applied in analysis of bankruptcy, text documents, and gene expression
Keywords
data analysis; self-organising feature maps; software metrics; unsupervised learning; Self-Organizing Map-based data visualization; Unsupervised learning; auxiliary data; cluster analysis; data analysis; distance measures; exploratory data analysis; gene expression; Chromium; Data analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks for Signal Processing XI, 2001. Proceedings of the 2001 IEEE Signal Processing Society Workshop
Conference_Location
North Falmouth, MA
ISSN
1089-3555
Print_ISBN
0-7803-7196-8
Type
conf
DOI
10.1109/NNSP.2001.943110
Filename
943110
Link To Document