Title :
Distance Metric Learning through Optimization of Ranking
Author :
Gopal, Kreshna ; Ioerger, Thomas R.
Author_Institution :
Texas A&M Univ., College Station
Abstract :
Data preprocessing is important in machine learning, data mining, and pattern recognition. In particular, selecting relevant features in high- dimensional data is often necessary to efficiently construct models that accurately describe the data. For example, many lazy learning algorithms (like k- Nearest Neighbor) rely on feature-based distance metrics to compare input patterns for the purpose of classification or retrieval from a database. In previous work, we introduced Slider, a distance metric learning method that optimizes the weights of features in a protein model-building application (where features are used to describe patterns of electron density around protein macromolecules). In this work, we demonstrate the usefulness of Slider as a general method for classification, ranking and retrieval, with results on several benchmark datasets. We also compare it to other well-known feature selection or weighting methods.
Keywords :
biology computing; data mining; information retrieval; learning (artificial intelligence); optimisation; pattern classification; proteins; Slider algorithm; data mining; data preprocessing; distance metric learning; information retrieval; machine learning; optimization; pattern classification; pattern recognition; protein model-building application; Data mining; Data preprocessing; Information retrieval; Learning systems; Machine learning; Machine learning algorithms; Nearest neighbor searches; Pattern recognition; Proteins; Spatial databases;
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
DOI :
10.1109/ICDMW.2007.113