[R] Calculating R2 for a unit slope regression
Rolf Turner
r.turner at auckland.ac.nz
Mon Nov 3 21:15:14 CET 2008
On 4/11/2008, at 4:30 AM, J. Sebastian Tello wrote:
> Does anyone know of a literature reference, or a piece of code that
> can help me
> calculate the amount of variation explained (R2 value), in a
> regression constrained
> to have a slope of 1 and an intercept of 0?
The question is ``wrong''. The idea of ``amount of variation
explained''
depends on decomposing the ``total sum of squares'' into two pieces
--- the sum of
squares of the residuals what is left over which is the sum of
squares ``explained
by the model''. In the usual regression setting this is
sum((y_i - ybar)^2) = sum((y_i - yhat_i)^2) + sum((yhat_i - ybar)^2)
or
SST = SSE + SSR (T for total, E for error, R for regression)
where yhat_i results from fitting the model by least squares.
The R-squared value is SSR/SST or 1 - SSE/SST. (Or this quantity
time 100%.)
However if you constrain the slope to be 1 and the intercept to be 0
then
yhat_i = x_i and the forgoing identity does not hold. The problem is
that
the ``sum of squares left over'' can be negative (and hence not a sum
of squares).
I.e. in this case you have
SST = SSE + something
where ``something'' is not necessarily a sum of squares.
Thus you can have the ``amount of variation explained'' being negative!
E.g. x_1 = -1, x_2 = 1, y_1 = 1, y_2 = -1. In this setting the
``total sum of squares'' is 2 and the ``residual sum of squares'' is 4,
so the ``amount of variation explained by the model'' is -2, or you
could say that R-squared is -100%. (!!!)
Bottom line --- the R-squared concept makes no sense in this context.
The R-squared concept is at best dubious, and should be used, if at all,
only in the completely orthodox setting.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list