Abstract :
Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical
paradigm—type I and II errors, significance levels, power, confidence levels—they have
been the subject of philosophical controversy and debate for over 60 years. Both current
and long-standing problems of N–P tests stem from unclarity and confusion, even
among N–P adherents, as to how a test’s (pre-data) error probabilities are to be
used for (post-data) inductive inference as opposed to inductive behavior. We argue
that the relevance of error probabilities is to ensure that only statistical hypotheses
that have passed severe or probative tests are inferred from the data. The severity
criterion supplies a meta-statistical principle for evaluating proposed statistical
inferences, avoiding classic fallacies from tests that are overly sensitive, as well as
those not sensitive enough to particular errors and discrepancies.