Author_Institution :
Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
Abstract :
Hierarchical multilabel classification allows a sample to belong to multiple class labels residing on a hierarchy, which can be a tree or directed acyclic graph (DAG). However, popular hierarchical loss functions, such as the H-loss, can only be defined on tree hierarchies (but not on DAGs), and may also under- or over-penalize misclassifications near the bottom of the hierarchy. Besides, it has been relatively unexplored on how to make use of the loss functions in hierarchical multilabel classification. To overcome these deficiencies, we first propose hierarchical extensions of the Hamming loss and ranking loss which take the mistake at every node of the label hierarchy into consideration. Then, we first train a general learning model, which is independent of the loss function. Next, using Bayesian decision theory, we develop Bayes-optimal predictions that minimize the corresponding risks with the trained model. Computationally, instead of requiring an exhaustive summation and search for the optimal multilabel, the resultant optimization problem can be efficiently solved by a greedy algorithm. Experimental results on a number of real-world data sets show that the proposed Bayes-optimal classifier outperforms state-of-the-art methods.
Keywords :
Bayes methods; decision theory; pattern classification; Bayes-optimal hierarchical multilabel classification; Bayesian decision theory; Hamming loss; greedy algorithm; optimization problem; ranking loss; Bayes methods; Bismuth; Decision theory; Greedy algorithms; Optimization; Prediction algorithms; Training; Bayesian decision theory; Hierarchical classification; hierarchical classification; loss function; multilabel classification;