[R] SVM probability output variation
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Oct 21 20:55:19 CEST 2009
Howdy,
On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote:
<snip>
> Yes, exactly that. In your example, though, the variation seems to
> be a lot smaller. I'm guessing that has to with the data.
>
> If I instead output the decision values, the whole procedure is
> fully reproducible, i.e. the exact same values are returned when I
> retrain the model.
By the decision values, you mean the predict labels, right?
> I have no idea how the probabilities are calculated, but it seems to
> be in this step that the differences arise. In my case, I feel a bit
> hesitant to use them when they differ that much between runs (15% or
> so)...
I'd find that a bit disconcerting, too. Can you give a sample of your
data + code your using that can reproduce this example?
Warning: Brainstorming Below
If I were to calculate probabilities for my class labels, I'd make the
probability some function of the example's distance from the decision
boundary.
Now, if your decision boundary isn't changing from run to run (and I
guess it really shouldn't be, since the SVM returns the maximum margin
classifier (which is, by definition, unique, right?)), it's hard to
imagine why these probabilities would change, either ...
... unless you're holding out different subsets of your data during
training, or perhaps have a different value for your penalty (cost)
parameter when building the model. I believe you said that you're
actually training the same exact model each time, though, right?
Anyway, I see the help page for ?svm says this, if it helps:
"The probability model for classification fits a logistic distribution
using maximum likelihood to the decision values of all binary
classifiers, and computes the a-posteriori class probabilities for the
multi-class problem using quadratic optimization"
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list