# [R] Best performance measure?

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Aug 19 21:48:54 CEST 2009

```Noah Silverman wrote:
> Frank,
>
> Visually, the loess curve really helps me see how the model is doing.
>
> That leads me to two more questions:
>
> 1) Can I somehow summarize the loess curve into a single value?  (If I'm
> comparing a few hundred models/parameters it would be nice to have a
> single "performance" value to use.)

Two measures are computed by the function: mean absolute error and 0.9
quantile of absolute error.

>
> 2) Is there a way to focus in on a segment of the loess curve.  With the
> binning setup, I can quickly see that my model is very accurate for a
> specific range of probabilities and then loses accuracy.  For example,
> with binning, my model is very accurate with probabilities from .1 to
> .5.  Above .5 and it drops off significantly.  This is actually very

That may be an artifact of binning.  loess us much better for that.

Signing off for now,
Frank

> useful for my application as I know in the real world, I can reliably
> count on predictions below .5 and can not count on predictions above .5
>
> Thanks for the continued help!
>
> -N
>
>
> On 8/19/09 12:11 PM, Frank E Harrell Jr wrote:
>> Noah Silverman wrote:
>>> Frank,
>>>
>>> That makes sense.
>>>
>>> I just had a look at the actual algorithm calculating the Briar score.
>>> One thing that confuses me is how the score is calculated.
>>>
>>>
>>>
>>> If I understand the code correctly, it is just:  sum((p - y)^2)/n
>>>
>>> If I have an example with a label of 1 and a probability prediction
>>> of .4, it is (.4 - 1)^2 (I know it is the average of these value
>>> across all the examples)
>>
>> Yes and I seem to remember the original score is 1 minus that.
>>
>>>
>>> Wouldn't it make more sense to stratify the probabilities and then
>>> check the accuracy of each level.
>>
>> The stratification will bring a great deal of noise into the problem.
>> Better: loess calibration curves or decomposition of the Brier score
>> into discrimination and calibration components (which is not in the
>> software).
>>
>> Frank
>>
>>>
>>> i.e.  For predicted probabilities of .10 to .20 the data was actually
>>> labeled true .18 percent of the time. mean(label)
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
>>>> Noah Silverman wrote:
>>>>> Thanks for the suggestion.
>>>>>
>>>>> You explained that Briar combines both accuracy and discrimination
>>>>> ability.  If I understand you right, that is in relation to binary
>>>>> classification.
>>>>>
>>>>> I'm not concerned with binary classification, but the accuracy of
>>>>> the probability predictions.
>>>>>
>>>>> Is there some kind of score that measures just the accuracy?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -N
>>>>
>>>> The Brier score has nothing to do with classification.  It is a
>>>> probability accuracy score.
>>>>
>>>> Frank
>>>>
>>>>>
>>>>> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>>>>>> Noah Silverman wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I working on a model to predict probabilities.
>>>>>>>
>>>>>>> I don't really care about binary prediction accuracy.
>>>>>>>
>>>>>>> I do really care about the accuracy of my probability predictions.
>>>>>>>
>>>>>>> Frank was nice enough to point me to the val.prob function from
>>>>>>> the Design library.  It looks very promising for my needs.
>>>>>>>
>>>>>>> I've put together some tests and run the val.prob analysis.  It
>>>>>>> produces some very informative graphs along with a bunch of
>>>>>>> performance measures.
>>>>>>>
>>>>>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>>>>>> one.  I'm comparing hundreds of different models/parameter
>>>>>>> combinations/etc.  So Ideally I'd like a single value or two as
>>>>>>> the "performance measure" for each one.  That way I can pick the
>>>>>>> "best"  model from all my experiments.
>>>>>>>
>>>>>>> As mentioned above, I'm mainly interested in the accuracy of my
>>>>>>> probability predictions.
>>>>>>>
>>>>>>> Does anyone have an opinion about which measure I should look at??
>>>>>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>>>>>
>>>>>>> Thanks!!
>>>>>>>
>>>>>>> -N
>>>>>>
>>>>>> It all depends on the goal, i.e., the relative value you place on
>>>>>> absolute accuracy vs. discrimination ability. The Brier score
>>>>>> combines both and other than interpretability has many advantages.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
Department of Biostatistics   Vanderbilt University

```