[R] SVM probability output variation

Wed Oct 21 20:55:19 CEST 2009

Howdy,

On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote:
<snip>
> Yes, exactly that. In your example, though, the variation seems to  
> be a lot smaller. I'm guessing that has to with the data.
>
> If I instead output the decision values, the whole procedure is  
> fully reproducible, i.e. the exact same values are returned when I  
> retrain the model.

By the decision values, you mean the predict labels, right?

> I have no idea how the probabilities are calculated, but it seems to  
> be in this step that the differences arise. In my case, I feel a bit  
> hesitant to use them when they differ that much between runs (15% or  
> so)...

I'd find that a bit disconcerting, too. Can you give a sample of your  
data + code your using that can reproduce this example?

Warning: Brainstorming Below

If I were to calculate probabilities for my class labels, I'd make the  
probability some function of the example's distance from the decision  
boundary.

Now, if your decision boundary isn't changing from run to run (and I  
guess it really shouldn't be, since the SVM returns the maximum margin  
classifier (which is, by definition, unique, right?)), it's hard to  
imagine why these probabilities would change, either ...

... unless you're holding out different subsets of your data during  
training, or perhaps have a different value for your penalty (cost)  
parameter when building the model. I believe you said that you're  
actually training the same exact model each time, though, right?

Anyway, I see the help page for ?svm says this, if it helps:

"The probability model for classification fits a logistic distribution  
using maximum likelihood to the decision values of all binary  
classifiers, and computes the a-posteriori class probabilities for the  
multi-class problem using quadratic optimization"

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact