Author/Authors :
Anderson، نويسنده , , Robert P and Lew، نويسنده , , Daniel G. Peterson، نويسنده , , A.Townsend، نويسنده ,
Abstract :
The Genetic Algorithm for Rule-Set Prediction (GARP) is one of several current approaches to modeling species’ distributions using occurrence records and environmental data. Because of stochastic elements in the algorithm and underdetermination of the system (multiple solutions with the same value for the optimization criterion), no unique solution is produced. Furthermore, current implementations of GARP utilize only presence data—rather than both presence and absence, the more general case. Hence, variability among GARP models, which is typical of genetic algorithms, and complications in interpreting results based on asymmetrical (presence-only) input data make model selection critical. Generally, some locality records are randomly selected to build a distributional model, with others set aside to evaluate it. Here, we use intrinsic and extrinsic measures of model performance to determine whether optimal models can be identified based on objective intrinsic criteria, without resorting to an independent test data set. We modeled potential distributions of two rodents (Heteromys anomalus and Microryzomys minutus) and one passerine bird (Carpodacus mexicanus), creating 20 models for each species. For each model, we calculated intrinsic and extrinsic measures of omission and commission error, as well as composite indices of overall error. Although intrinsic and extrinsic composite measures of overall model performance were sometimes loosely related to each other, none was consistently associated with expert-judged model quality. In contrast, intrinsic and extrinsic measures were highly correlated for both omission and commission in the two widespread species (H. anomalus and C. mexicanus). Furthermore, a clear inverse relationship existed between omission and commission there, and the best models were consistently found at low levels of omission and moderate-to-high commission values. In contrast, all models for M. minutus showed low values of both omission and commission. Because models are based only on presence data (and not all areas are adequately sampled), the commission index reflects not only true commission error but also a component that results from undersampled areas that the species actually inhabits. We here propose an operational procedure for determining an optimal region of the omission/commission relationship and thus selecting high-quality GARP models. Our implementation of this technique for H. anomalus gave a much more reasonable estimation of the species’ potential distribution than did the original suite of models. These findings are relevant to evaluation of other distributional-modeling techniques based on presence-only data and should also be considered with other machine-learning applications modified for use with asymmetrical input data.
Keywords :
GARP , Asymmetrical errors , Genetic algorithms , Confusion Matrix , COMMISSION , range , Omission