[R] Estimating R2 value for a unit-slope regression
J. Sebastian Tello
jsebastiantello at yahoo.com
Sat Nov 1 00:30:46 CET 2008
Dear all,
I am in need to estimate the amount of variation
explained in a variable after simulations that produce a predictor
which is in the same units as the dependent variable (numbers of
species). Since the dependent and predictor variables are the same, I would think the most appropriate analysis would be a
regression constrained to have an intercept of 0 and a slope of 1. I
am trying to write a piece of R code to do this, but I am running into
some problems, so I wanted to ask for your advice. I have inverstigated
3 approaches, and i am including a jpg file with the behaivour of these
three pieces of code (R2 values as a function of the slope of the ols regression). I also included the regular R2 value from the ols regression for comparison (black symbols in figure).
R2 for a regression can be calculated by the formula: R2= (SSY-SSE)/SSY; so:
#1 Green symbols in figure
SSY<-sum((y-mean(y))^2)
SSE<-SSE<-sum((y-x)^2)
R2<-(SSY-SSE)/SSY
where y is the dependent and x the predictor variables respectively, of course.
However, I am running into trouble because some times the residual sum of squares (SSE) is larger than the SS of the
dependent variable (SSY) and I end up having negative R2s which of
course make no sense.
Another way to put the same formula is: R2=SSR/SSR+SSE; so:
#2 Blue symbols in figure
SSR<-sum((x-mean(y))^2)
SSE<-sum((y-x)^2)
R2<-SSR/(SSR+SSE)
This
approach behaves beter in the sense that it stays within the 0 to 1
expected range, it peaks when the slope is equal to 1, but its decal as
the slope moves away from 1 is too slow, and for example when the slope
is zero, according to this the R2 value is of about 0.4.
The third and final approach that I have used is that described by Romdal et al. 2005. In which they
use the second formula: R2=SSR/SSR+SSE, but they use (at least is how I
understand it) the sum of squares of a regular OLS to estimate the sum of squares of the regression:
so the corresponding code would be:
#3 Red symbols in figure
lm.y.x<-lm(y~x)
SSR<-(deviance(lm(y~1))-sum((lm.y.x$residuals)^2))
SSE<-sum((y-x)^2)
R2<-SSR/(SSR+SSE)
This
also of course stays within the expected range of 0 to 1, but has its
own troubling behaivour, it does not peak at a slope of 1, there is an
accelerated decrease at slopes less than 1, but not at slopes larger
than 1, and it increases again at slopes less than 0 (like if negative
associations between y and x would be better than a flat line, when the
predictor is the same vairable as the dependen this does not make sense
again).
Any advice, recomendations for appropiate literature, or pieces of code would be highly appreciated.
Best,
Sebastian
J. Sebastián Tello
Department of Biological Sciences
285 Life Sciences Building
Louisiana State University
Baton Rouge, LA, 70803
(225) 578-4284 (office and lab.)
More information about the R-help
mailing list