Title of article
An exact test of the accuracy of binary classification models based on the probability distribution of the average rank
Author/Authors
May، نويسنده , , Jerrold H. and Vargas، نويسنده , , Luis G.، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2009
Pages
9
From page
975
To page
983
Abstract
We propose a new way to evaluate the discriminatory power of models that generate a continuous value as the basis for performing a binary classification task. Our hypothesis test uses the average rank of the k successes in the sample of size n , based on those continuous values. We derive the probability mass function for the average rank from the coefficients of a Gaussian polynomial distribution that results from randomly sampling k distinct positive integers, all n or less. The significance level of the test is found by counting the number of arrangements that produce average ranks more extreme than the one observed. Recursive relationships can be used to calculate the values necessary to compute the p -value. For large values of k and n , for which exact computation might be prohibitive, we present numerical results which indicate that the critical values of the distribution are nearly linear in n for a fixed k and that the coefficients of the linear relationships are nonlinear functions of k and the desired percentile. We develop regression models for those relationships to approximate the number of arrangements in order to make the test practical for large values of k and n .
Keywords
DATA MINING , Model evaluation , Gaussian polynomial , Hypergeometric series
Journal title
Mathematical and Computer Modelling
Serial Year
2009
Journal title
Mathematical and Computer Modelling
Record number
1596566
Link To Document