DocumentCode :
607662
Title :
ID for data with multiple clusters
Author :
Ari, I. ; Cemgil, A.T. ; Akarun, Lale
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Bogazici Univ., Bebek, Turkey
fYear :
2013
fDate :
24-26 April 2013
Firstpage :
1
Lastpage :
4
Abstract :
Interpolative decomposition (ID) is a matrix factorization which aims to represent the data matrix via a subset of its own columns. These selected columns are supposed to hold the salient features expressing the data. A very common ID approach in the literature is based on importance sampling where a statistical leverage score is computed for each column and K columns are randomly selected using these scores. These randomized methods aim a better low-rank approximation of the matrix by seeking for the columns that express the range of the matrix the best. This makes ID a good alternative to Singular Value Decomposition (SVD) since it favors sparsity and the bases correspond to real data points. However, the columns leading to the best low-rank approximation are usually not the ones in terms of representativeness if the underlying data is composed of several clusters which is very common in real life. In this paper, we introduce an alternative ID approach based on clustering. We employ K-medoids to be employed as an ID method for better interpretability and respresentativeness. We apply ID on handwritten digit recognition and supply comparative results of the proposed approach to the state-of-the-art method in the literature. We show its superiority in terms of representativeness of the data. We demonstrate that most of the data can be discarded without compromising the accuracy.
Keywords :
handwritten character recognition; matrix decomposition; object recognition; pattern clustering; singular value decomposition; ID approach; K-medoids; SVD; clustering; data matrix; handwritten digit recognition; interpolative decomposition; low-rank matrix approximation; matrix factorization; salient features; singular value decomposition; statistical leverage score; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Machine learning algorithms; Matrix decomposition; Monte Carlo methods; Clustering; Interpolative Decomposition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2013 21st
Conference_Location :
Haspolat
Print_ISBN :
978-1-4673-5562-9
Electronic_ISBN :
978-1-4673-5561-2
Type :
conf
DOI :
10.1109/SIU.2013.6531308
Filename :
6531308
Link To Document :
بازگشت