Abstract :
Learning to rank is an important task for many data mining applications. Essentially, the goal of learning to rank is to learn an appropriate similarity or distance metric to determine the relevance relationships among data points. However, most of the existing approaches for distance metric learning are limited in three aspects. First, they often assume a fixed form of distance metric for the entire input space. Second, the assumed distance functions are often computationally expensive or even intractable to learn for high dimensional data, such as Mahalanobis distance. Third, most of these approaches lack robustness to noisily labeled data, which is pervasive in many real-world applications. In this paper, we study learning to rank as a problem of distance metric learning to address the above three problems. We choose Bregman distance as the target distance function, due to its general functional form as a generalization of a wide class of distance functions, and its capacity of exploiting complicated nonlinear patterns underlying the data. Under the framework of structural SVM, we formulate the problem of learning Bregman distance functions for ranking as a QP problem by a nonparametric approach, and present an effective algorithm. Furthermore, we propose a self-reinforcement scheme that adaptively differentiates each data point in the role of learning to secure the robustness. We emphasize that the proposed method SBLR-S (Structural Bregman distance functions Learning to Rank with Self-reinforcement) is more general than the conventional distance metric learning approaches, and is able to handle high dimensional data as well as noisily labeled data. The experiments of data ranking on real-world datasets show the superiority of this method to the state-of-the-art literature.
Keywords :
data mining; generalisation (artificial intelligence); learning (artificial intelligence); nonparametric statistics; support vector machines; Bregman distance; Bregman distance function learning problem; Mahalanobis distance; QP problem; SBLR-S method; complicated nonlinear patterns; data mining applications; data points; distance functions; distance metric learning approach; nonparametric approach; self-reinforcement; self-reinforcement scheme; structural Bregman distance function learning; structural Bregman distance functions learning to rank with self-reinforcement method; structural SVM; Data models; Kernel; Measurement; Optimization; Robustness; Support vector machines; Training; Bregman distance; distance metric learning; learning to rank; structural learning;