[R] binning results

Noah Silverman noah at smartmediacorp.com
Wed Aug 5 20:11:05 CEST 2009


Hello,

I asked this as part of a previous message, but never really figured out 
a usable solution.  So this is a second attempt.

I have an process containing an SVM.  The end result is the probability 
that the class is true.  That result is added back to the original data.

So I wind up with a data.frame that looks like this

label,v1,v2,v3,prob_true

What I want to do is measure how accurate my model is for each range of 
probability.  (I've seen this done is a few published papers and found 
it a very useful way to visualize things.)

My hope/guess is that there is some kind of package for R that does this 
since it should be a common need.

Here is an example of what I'd like to be able to generate:

range        number of items        mean(probability)   true_accuracy
100-90%        20                            .924                    .90
90-80%          50                            .825                    .84
80-70%          214                          .75                      .71
etc...

range is the range of predicted values by the SVM
mean(probability) is the mean of the PREDICTED probability of items in 
that range
true_accuracy is the mean of the ACTUAL probability of items in that range.

In English I would explain it as, "Of the data where our SVM predicted a 
true probability of 70-80%, the data was actually 71% true."

It might be really  helpful to be able to graph this somehow.  (Again, 
There must be some package in R for this??)
With mean(predicted_probability) on one axis and mean(true_probability) 
on the other axis.

Any thoughts, comments, ideas, etc. would be appreciated!

Thank You




More information about the R-help mailing list