[R] probabilities from predict.svm

Fri Aug 20 16:07:08 CEST 2010

Awesome!

Good news, James. Thanks for letting us know. Glad you were able to
sort this out.

-steve

On Thu, Aug 19, 2010 at 5:00 PM, Watling,James I <watlingj at ufl.edu> wrote:
> Hi Steve--
>
> I spent some more time tuning the model with alternative gamma and cost values, but still kept coming back to the same issue re: probabilities. I spent some more time playing around with the code, and realized that the error did indeed have to do with the ifelse() function I used to feed the probabilities into the ascii file.  I have rewritten the code with a replace() statement, and the probabilities have 'landed' in the correct place in the ascii file.  The resulting map is exactly what I would expect.
>
> Thanks for your helpful suggestions that forced me to figure this out!
>
> Much appreciated
>
> James
>
>
> -----Original Message-----
> From: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
> Sent: Thursday, August 19, 2010 11:39 AM
> To: Watling,James I
> Cc: r-help at lists.R-project.org
> Subject: Re: [R] probabilities from predict.svm
>
> On Thu, Aug 19, 2010 at 10:56 AM, Watling,James I <watlingj at ufl.edu> wrote:
>> Hi Steve--
>>
>> Thanks for your interest in helping me figure this out.  I think the problem has to do with the values of the probabilities returned from the use of the model to predict occurrence in a new dataframe.
>
> Ok, so if you're sure this is the problem, and not, say, getting the
> correct values for the predictor variables at a given point, then I'd
> be a bit more thorough when building your model.
>
> Originally you said:
>
>> I have used a training dataset to train the model, and tested it against a validation data set with good results: AUC is high, and the confusion matrix indicates low commission and omission errors.
>
> Maybe your originally "good" AUC's was just a function of your train/test split?
>
> Why not use all of your data and do something like 10 fold cross
> validation to find:
>
> (1) Your average accuracy over your folds
> (2) The best value for your cost parameter; (how did you pick cost=10000)?
> (3) or even the best kernel to use.
>
> Doing 2 and 3 will likely be time consuming. To help with (2) you
> might try looking at the svmpath package:
>
> http://cran.r-project.org/web/packages/svmpath/index.html
>
> It only works on 2-class classification problems, and (I think) using
> a linear kernel (sorry, don't remember off hand, but it's written in
> the package help and linked pubs).
>
> You don't need to use svmpath, but then you'll need to define a "grid"
> of C values (or maybe a 2d grid, if your svm + kernel combo has more
> params) and train over these values ... takes lots of cpu time, but
> not too much human time.
>
> Does that make sense?
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact