DocumentCode :
186026
Title :
Formal analysis of cross-validation based on decremental sampling scheme
Author :
Tsumoto, Shusaku ; Hirano, Shoji
Author_Institution :
Dept. of Med. Inf., Shimane Univ., Izumo, Japan
fYear :
2014
fDate :
22-24 Oct. 2014
Firstpage :
298
Lastpage :
303
Abstract :
Cross validation method has been widely used for estimation of performance of classifiers and statistical method. However, compared with other resampling methods, cross-validation has not yet theoretically investigated since the sampling scheme is based not only on stochasticity but also on set-based processing. This paper proposes a new framework for evaluation of cross-validation methods based on decremental sampling scheme. Decremental sampling scheme is a reversed form of incremental sampling which gives estimated values of statistical indices when some samples are deleted from an original dataset, dual to ones obtained by incremental scheme. Interestingly, if we focus on one target relation and one target concept, four possibilities should be considered for deletion of an example, and updates of statistical indices are obtained as simple formulae, which depends on the characteristics of a original dataset. We applied this technique to the leave-one out method for rules defined by the propositions whose constraints were defined by inequalities of accuracy and coverage. The following results are obtained: first, estimated values of statistical indices are equal to the original values. Second, the variance of these indices are obtained as a simple for formula of the number of total examples, and the the size of supporting sets of a target relation and a target concept, which will converge to 0 when the size of a given dataset is sufficiently large. Thus, the main contributed part of estimation is on the usage of thresholds for statistical indices. Especially, rules located in the boundary of thresholds lead to complex behavior of cross-validated estimation. The results show that the evaluation framework gives a powerful tool for evaluation of the leave-out method.
Keywords :
pattern classification; probability; rough set theory; sampling methods; accuracy inequalities; complex behavior; coverage inequalities; cross-validation method; dataset characteristics; decremental sampling scheme; formal analysis; incremental sampling; leave-one out method; set-based processing; statistical indices; stochasticity; target concept; target relation; threshold boundary; Accuracy; Approximation methods; Educational institutions; Error analysis; Estimation; Probabilistic logic; Set theory; accuracy; coverage; decremental sampling scheme; leave-one-out; rule induction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing (GrC), 2014 IEEE International Conference on
Conference_Location :
Noboribetsu
Type :
conf
DOI :
10.1109/GRC.2014.6982853
Filename :
6982853
Link To Document :
بازگشت