DocumentCode :
834022
Title :
Metric learning for text documents
Author :
Lebanon, Guy
Author_Institution :
Dept. of Stat., Purdue Univ., West Lafayette, IN, USA
Volume :
28
Issue :
4
fYear :
2006
fDate :
4/1/2006 12:00:00 AM
Firstpage :
497
Lastpage :
508
Abstract :
Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.
Keywords :
Lie groups; differential geometry; learning (artificial intelligence); statistical analysis; text analysis; transforms; Fisher information; Lie group; Riemannian metric; Riemannian volume element; differentiable manifold; geodesic distance; inverse volume maximization; machine learning; maximum likelihood; metric learning; multinomial simplex; pull-back metrics; text documents; tfidf cosine similarity measure; Euclidean distance; Geometry; Joining processes; Kernel; Level measurement; Machine learning; Machine learning algorithms; Neural networks; Probability; Text analysis; Distance learning; machine learning.; text analysis; Algorithms; Artificial Intelligence; Automatic Data Processing; Computer Graphics; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; User-Computer Interface;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2006.77
Filename :
1597108
Link To Document :
بازگشت