[R] R vs. Excel (R-squared)

Wed Jan 25 17:32:58 CET 2006

I was quite interested in this thread (discussion),
once that I am chemistry student and I work with Mixtures Designs that are
models without intercept.

I thought quite attention the follow afirmation:

' Thus SST, the "corrected" total
sum of squares, should be used when you have a model with an intercept
term but the uncorrected total sum of squares should be used when you
do not have an intercept term. ' (Douglas Bates)

I have as reference a book called:

"Experiments with Mixtures: Designs, Models, and the Analysis of Mixture 
Data"
second edition

John A. Cornell
(Professor of Statistics in University Of Florida)

In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE,
I have the model below:

y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3

with the follow ANOVA Table:

source of variation    D.F.    SS                    MS

Regression        p-1    SSR=\sum( y_{pred} - y_{mean} )^2    ssR/(p-1)

Residual        N-p    SSE=\sum( y_{exp} - y_{pred} )^2    ssE/(N-p)   

Total            N-1    SSE=\sum( y_{exp} - y_{mean} )^2

pred = predicted
exp = experimental

and in many others books.

I always see the ANOVA Table of Mixtures systems with SST, the 
"corrected" total
sum of squares ( N-1 degrees freedom ).

I would like to ask:

1) What is approach ( point view ) more adequate ?

2) Could someone indicate some reference about this subject ?

Thanks a lot.
Regards

Cleber N. Borges

############################ Dados
x1      x2      x3      y
1    0    0    11
1    0    0    12.4
0.5    0.5    0    15
0.5    0.5    0    14.8
0.5    0.5    0    16.1
0    1    0    8.8
0    1    0    10
0    0.5    0.5    10
0    0.5    0.5    9.7
0    0.5    0.5    11.8
0    0    1    16.8
0    0    1    16
0.5    0    0.5    17.7
0.5    0    0.5    16.4
0.5    0    0.5    16.6

############################## Model

d.lm <- lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados )

### Anova like in the book
d.aov <- aov( y ~  x1*x2*x3 - x1:x2:x3, data = Dados )
#### SSR (fitted Model) = 128.296

Douglas Bates wrote:

>On 1/24/06, Lance Westerhoff <lance at quantumbioinc.com> wrote:
>  
>
>>Hi-
>>
>>On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>>
>>    
>>
>>>Lance Westerhoff <lance at quantumbioinc.com> writes:
>>>
>>>      
>>>
>>>>Hello All-
>>>>
>>>>I found an inconsistency between the R-squared reported in Excel vs.
>>>>that in R, and I am wondering which (if any) may be correct and if
>>>>this is a known issue.  While it certainly wouldn't surprise me if
>>>>Excel is just flat out wrong, I just want to make sure since the R-
>>>>squared reported in R seems surprisingly high.  Please let me know if
>>>>this is the wrong list.  Thanks!
>>>>        
>>>>
>>>Excel is flat out wrong. As the name implies, R-squared values cannot
>>>be less than zero (adjusted R-squared can, but I wouldn't think
>>>that is what Excel does).
>>>      
>>>
>>I had thought the same thing, but then I came across the following
>>site which states: "Note that it is possible to get a negative R-
>>square for equations that do not contain a constant term. If R-square
>>is defined as the proportion of variance explained by the fit, and if
>>the fit is actually worse than just fitting a horizontal line, then R-
>>square is negative. In this case, R-square cannot be interpreted as
>>the square of a correlation." Since
>>
>>R^2 = 1 - (SSE/SST)
>>
>>I guess you can have SSE > SST which would result in a R^2 of less
>>then 1.0.  However, it still seems very strange which made me wonder
>>what is going on in Excel needless to say!
>>
>>http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
>>ch_fitt9.html
>>    
>>
>
>This seems to be a case of using the wrong formula.  R^2 should
>measure the amount of variation for which the given model accounts
>relative to the amount of variation for which the *appropriate* null
>model does not account.  If you have a constant or intercept term in a
>linear model then the null model for comparison is one with the
>intercept only.  If you have a linear model without an intercept term
>then the appropriate null model for comparison is the model that
>predicts all the responses as zero.  Thus SST, the "corrected" total
>sum of squares, should be used when you have a model with an intercept
>term but the uncorrected total sum of squares should be used when you
>do not have an intercept term.
>
>It is disappointing to see the MathWorks propagating such an
>elementary misconception.
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>.
>
>  
>