[R] probabilities from predict.svm
Watling,James I
watlingj at ufl.edu
Thu Aug 19 16:56:13 CEST 2010
Hi Steve--
Thanks for your interest in helping me figure this out. I think the problem has to do with the values of the probabilities returned from the use of the model to predict occurrence in a new dataframe. The svm model I referenced in the original message (svm.model) does a good job classifying species presence and absence in the test data set I used. So I don't think the problem is with building the predictive svm per se. The problem comes when I take that model and use it to calculate probabilities based on the climate predictors--the resulting probabilities range from 0-1, but the probability of presence associated with specific cells just does not make sense. If you take a look at the maps I attached in the original message I think the problem becomes very clear; the maps model the probability of occurrence for the American Crocodile--a species with an entirely tropical distribution. The second map looks exactly like the prediction map for the species should--the warmer colors essentially delineate the geographic range of the species. The first map, with probabilities extracted from the use of svm.model to predict occurrence as a function of climate variables in the same area (the predict.data dataframe) does not make any sense. I don't think the problem is with getting the probabilities in the right place, because the relative position of predicted values and NA's used to define the map make sense--the map looks like a map of southern North America and northern South America, just as it should. So the probabilities are in the right place on the map. The problem is that the probabilities associated with each individual cell are, in a word, wrong. The original model (svm.model) was parameterized with 10,000 pseudoabsences drawn from throughout the entire region, so the range of climate values used to create the original model is the same as that reflected in the data I am using to build the prediction map. I can't think of any reason that the probabilities returned from pred.map<-predict(svm.model, predict.data, probability=T)should be so off-base, but it seems like they are.
Any thoughts?
James
Hi James,
I'd like to help you out, but I'm not sure I understand what the problem is.
Does the problem lie with building a predictive SVM, or getting the
right values (class probabilities) to land in the right place on your
map/plot?
-steve
On Wed, Aug 18, 2010 at 3:09 PM, Watling,James I <watlingj at ufl.edu> wrote:
> Dear R Community-
>
> I am a new user of support vector machines for species distribution modeling and am using package e1071 to run svm() and predict.svm(). Briefly, I want to create an svm model for classification of a factor response (species presence or absence) based on climate predictor variables. I have used a training dataset to train the model, and tested it against a validation data set with good results: AUC is high, and the confusion matrix indicates low commission and omission errors. The code for the best-fit model is:
>
> svm.model <-svm(as.factor(acutus)~p_feb+p_jan+p_mar+p_sep+t_feb+t_july+t_june+t_mar,cost=10000, gamma=1, probability=T)
>
> Because ultimately I want to create prediction maps of probabilities of species occurrence under future climate change, I want to use the results of the validated model to predict probability of presence using data describing future conditions. I have created a data frame (predict.data) with new values for the same predictor variables used in the original model; each value corresponds to an observation from a raster grid of the study area. I enabled the probability option when creating the original model, and acquire the probabilities using the predict function:
> pred.map <-predict(svm.model, predict.data, probability=T). However, when I use probs<-attr(pred.map, "probabilities") to acquire the probabilities for each grid cell, the spatial signature of the probabilities does make sense. I have extracted the column of probabilities for class = 1 (probability of presence), and the resulting map of the study area is spatially accurate (it has the right shape), but the probability values are incorrect, or at least in the wrong place. I am attaching a pdf (SVM prediction maps) of the resulting map using probabilities obtained using the code described above (page 1) and a map of what the prediction map should look like given spatial autocorrelation in climate predictors (page 2, map generated using openmodeller). Note that the openmodeller map was created with the same input data and same svm algorithm (also using code from libsvm) as the model in R, just run using different software. I don't know why the prediction map of probabilities based on the model is so different from what I would expect, and would appreciate any thoughts from the group.
>
> All the best
>
> James
>
> *******************************************************************************
> James I Watling, PhD
> Postdoctoral Research Associate
> University of Florida
> Ft. Lauderdale Research & Education Center
> 3205 College Avenue
> Ft Lauderdale, FL 33314 USA
> 954.577.6316 (phone)
> 954.475.4125 (fax)
>
>
> *******************************************************************************
>
>
