[R] Coefficient of determination -- should be "compare to a trivial model"

John C Nash nashjc at uottawa.ca
Wed Jun 17 14:54:58 CEST 2009


As a long time nonlinear modeller, I always compute a quantity
commonly referred to as R_squared or the coefficient of
determination. However, I agree with other commentators, including
those of several years ago, that one wants to be very careful about
interpretation. In fact, I would say "DO NOT interpret".

My usage of the quantity, which I tend to call "R_squared" but do
not think of some sort of correlation that is squared, is simply a 
comparison
of the current model -- no matter how arrived at -- and a model that
uses a single linear parameter that we usually call the mean. If this
"R_squared" is small or negative, I've got a pretty bad model. And
for models that don't include a linear parameter on its own, negatives
are not only possible, but not too uncommon. They indicate one has
tried to use a model that is very much at odds with the data, or that
there is a mistake somewhere e.g., ^ vs. * typed into a formula, or
a misplaced bracket.

A lot of the trouble comes from a long history of over-working of the
algebra in the y ~ a + b x simple linear regression case.

If we changed the name to something neutral like the "Nipigon
warning measure", a lot of the concerns might go away, and UseRs
could have a quick and simple sanity check for modelling.

JN




More information about the R-help mailing list