Title :
Estimating concept difficulty with cross-entropy
Author :
Nazar, K. ; Bramer, M.A.
Author_Institution :
School of Comput. Sci & Math., Portsmouth Univ., UK
Abstract :
Learning difficulty can vary considerably from one algorithm to another because different approaches may be biased or tuned towards a certain initial problem description. Reasons for poor performance include noise, class distribution, number of attributes or examples in the sample. However, when intrinsic accuracy is high and performance is poor, the problem can be caused by feature interaction. Patterns are more difficult to identify because they are conditional. Systems that attempt to learn in domains such as this can perform constructive induction to change the initial representation to one which makes classification information more visible. However, systems that attempt to reformulate example descriptions often do so regardless of the initial representation. The authors present a data-based detection measure that estimates concept difficulty. Several measures including μ-ness, variation, blurring (Δ) and Δj are compared. They argue that a measure based only on the a posteriori probability of the class variable has limited use, and that disparities between concept difficulty and blurring results on some data sets can be explained by employing a simple technique that averages the blurring measure over subsets generated by splitting on the best attribute
Keywords :
learning (artificial intelligence); μ-ness; a posteriori probability; algorithm; attributes; blurring; class distribution; class variable; classification information; concept difficulty estimation; constructive induction; cross-entropy; data-based detection measure; examples; feature interaction; intrinsic accuracy; learning difficulty; noise; patterns; performance; variation;
Conference_Titel :
Knowledge Discovery and Data Mining (Digest No. 1998/310), IEE Colloquium on
Conference_Location :
London
DOI :
10.1049/ic:19980548