[R] R vs. Excel (R-squared)

Tue Jan 24 21:25:36 CET 2006

On 1/24/06, Lance Westerhoff <lance at quantumbioinc.com> wrote:
>
> Hi-
>
> On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>
> > Lance Westerhoff <lance at quantumbioinc.com> writes:
> >
> >> Hello All-
> >>
> >> I found an inconsistency between the R-squared reported in Excel vs.
> >> that in R, and I am wondering which (if any) may be correct and if
> >> this is a known issue.  While it certainly wouldn't surprise me if
> >> Excel is just flat out wrong, I just want to make sure since the R-
> >> squared reported in R seems surprisingly high.  Please let me know if
> >> this is the wrong list.  Thanks!
> >
> > Excel is flat out wrong. As the name implies, R-squared values cannot
> > be less than zero (adjusted R-squared can, but I wouldn't think
> > that is what Excel does).
>
> I had thought the same thing, but then I came across the following
> site which states: "Note that it is possible to get a negative R-
> square for equations that do not contain a constant term. If R-square
> is defined as the proportion of variance explained by the fit, and if
> the fit is actually worse than just fitting a horizontal line, then R-
> square is negative. In this case, R-square cannot be interpreted as
> the square of a correlation." Since
>
> R^2 = 1 - (SSE/SST)
>
> I guess you can have SSE > SST which would result in a R^2 of less
> then 1.0.  However, it still seems very strange which made me wonder
> what is going on in Excel needless to say!
>
> http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
> ch_fitt9.html

This seems to be a case of using the wrong formula.  R^2 should
measure the amount of variation for which the given model accounts
relative to the amount of variation for which the *appropriate* null
model does not account.  If you have a constant or intercept term in a
linear model then the null model for comparison is one with the
intercept only.  If you have a linear model without an intercept term
then the appropriate null model for comparison is the model that
predicts all the responses as zero.  Thus SST, the "corrected" total
sum of squares, should be used when you have a model with an intercept
term but the uncorrected total sum of squares should be used when you
do not have an intercept term.

It is disappointing to see the MathWorks propagating such an
elementary misconception.