[R] R vs. Excel (R-squared)
Cleber N. Borges
cborges at iqm.unicamp.br
Wed Jan 25 17:32:58 CET 2006
I was quite interested in this thread (discussion),
once that I am chemistry student and I work with Mixtures Designs that are
models without intercept.
I thought quite attention the follow afirmation:
' Thus SST, the "corrected" total
sum of squares, should be used when you have a model with an intercept
term but the uncorrected total sum of squares should be used when you
do not have an intercept term. ' (Douglas Bates)
I have as reference a book called:
"Experiments with Mixtures: Designs, Models, and the Analysis of Mixture
Data"
second edition
John A. Cornell
(Professor of Statistics in University Of Florida)
In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE,
I have the model below:
y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3
with the follow ANOVA Table:
source of variation D.F. SS MS
Regression p-1 SSR=\sum( y_{pred} - y_{mean} )^2 ssR/(p-1)
Residual N-p SSE=\sum( y_{exp} - y_{pred} )^2 ssE/(N-p)
Total N-1 SSE=\sum( y_{exp} - y_{mean} )^2
pred = predicted
exp = experimental
and in many others books.
I always see the ANOVA Table of Mixtures systems with SST, the
"corrected" total
sum of squares ( N-1 degrees freedom ).
I would like to ask:
1) What is approach ( point view ) more adequate ?
2) Could someone indicate some reference about this subject ?
Thanks a lot.
Regards
Cleber N. Borges
############################ Dados
x1 x2 x3 y
1 0 0 11
1 0 0 12.4
0.5 0.5 0 15
0.5 0.5 0 14.8
0.5 0.5 0 16.1
0 1 0 8.8
0 1 0 10
0 0.5 0.5 10
0 0.5 0.5 9.7
0 0.5 0.5 11.8
0 0 1 16.8
0 0 1 16
0.5 0 0.5 17.7
0.5 0 0.5 16.4
0.5 0 0.5 16.6
############################## Model
d.lm <- lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados )
### Anova like in the book
d.aov <- aov( y ~ x1*x2*x3 - x1:x2:x3, data = Dados )
#### SSR (fitted Model) = 128.296
Douglas Bates wrote:
>On 1/24/06, Lance Westerhoff <lance at quantumbioinc.com> wrote:
>
>
>>Hi-
>>
>>On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>>
>>
>>
>>>Lance Westerhoff <lance at quantumbioinc.com> writes:
>>>
>>>
>>>
>>>>Hello All-
>>>>
>>>>I found an inconsistency between the R-squared reported in Excel vs.
>>>>that in R, and I am wondering which (if any) may be correct and if
>>>>this is a known issue. While it certainly wouldn't surprise me if
>>>>Excel is just flat out wrong, I just want to make sure since the R-
>>>>squared reported in R seems surprisingly high. Please let me know if
>>>>this is the wrong list. Thanks!
>>>>
>>>>
>>>Excel is flat out wrong. As the name implies, R-squared values cannot
>>>be less than zero (adjusted R-squared can, but I wouldn't think
>>>that is what Excel does).
>>>
>>>
>>I had thought the same thing, but then I came across the following
>>site which states: "Note that it is possible to get a negative R-
>>square for equations that do not contain a constant term. If R-square
>>is defined as the proportion of variance explained by the fit, and if
>>the fit is actually worse than just fitting a horizontal line, then R-
>>square is negative. In this case, R-square cannot be interpreted as
>>the square of a correlation." Since
>>
>>R^2 = 1 - (SSE/SST)
>>
>>I guess you can have SSE > SST which would result in a R^2 of less
>>then 1.0. However, it still seems very strange which made me wonder
>>what is going on in Excel needless to say!
>>
>>http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
>>ch_fitt9.html
>>
>>
>
>This seems to be a case of using the wrong formula. R^2 should
>measure the amount of variation for which the given model accounts
>relative to the amount of variation for which the *appropriate* null
>model does not account. If you have a constant or intercept term in a
>linear model then the null model for comparison is one with the
>intercept only. If you have a linear model without an intercept term
>then the appropriate null model for comparison is the model that
>predicts all the responses as zero. Thus SST, the "corrected" total
>sum of squares, should be used when you have a model with an intercept
>term but the uncorrected total sum of squares should be used when you
>do not have an intercept term.
>
>It is disappointing to see the MathWorks propagating such an
>elementary misconception.
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>.
>
>
>
More information about the R-help
mailing list