[R] R vs. Excel (R-squared)

Kjetil Brinchmann Halvorsen kjetilbrinchmannhalvorsen at gmail.com
Wed Jan 25 18:12:58 CET 2006


Cleber N. Borges wrote:
> 
> 
> I was quite interested in this thread (discussion),
> once that I am chemistry student and I work with Mixtures Designs that are
> models without intercept.
> 
> I thought quite attention the follow afirmation:
> 
> ' Thus SST, the "corrected" total
> sum of squares, should be used when you have a model with an intercept
> term but the uncorrected total sum of squares should be used when you
> do not have an intercept term. ' (Douglas Bates)
> 
> 
> I have as reference a book called:
> 
> "Experiments with Mixtures: Designs, Models, and the Analysis of Mixture 
> Data"
> second edition
> 
> John A. Cornell
> (Professor of Statistics in University Of Florida)
> 
> 
> In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE,
> I have the model below:
> 
> 
> y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3
> 
> 
> with the follow ANOVA Table:
> 
> 
> source of variation    D.F.    SS                    MS
> 
> Regression        p-1    SSR=\sum( y_{pred} - y_{mean} )^2    ssR/(p-1)
> 
> Residual        N-p    SSE=\sum( y_{exp} - y_{pred} )^2    ssE/(N-p)   
> 
> Total            N-1    SSE=\sum( y_{exp} - y_{mean} )^2
> 
> 
> pred = predicted
> exp = experimental
> 
> and in many others books.
> 
> I always see the ANOVA Table of Mixtures systems with SST, the 
> "corrected" total
> sum of squares ( N-1 degrees freedom ).
> 
> 
> 
> I would like to ask:
> 
> 1) What is approach ( point view ) more adequate ?

With a mixture model, although you do not have a intercept term
directly in the model, it is there, occulted,as the sum of the design
variables representing the mixture is 1! So it is correct to use the
"corrected" sum of squares.

Kjetil

> 
> 2) Could someone indicate some reference about this subject ?
> 
> 
> Thanks a lot.
> Regards
> 
> 
> Cleber N. Borges
> 
> 
> 
> 
> 
> ############################ Dados
> x1      x2      x3      y
> 1    0    0    11
> 1    0    0    12.4
> 0.5    0.5    0    15
> 0.5    0.5    0    14.8
> 0.5    0.5    0    16.1
> 0    1    0    8.8
> 0    1    0    10
> 0    0.5    0.5    10
> 0    0.5    0.5    9.7
> 0    0.5    0.5    11.8
> 0    0    1    16.8
> 0    0    1    16
> 0.5    0    0.5    17.7
> 0.5    0    0.5    16.4
> 0.5    0    0.5    16.6
> 
> ############################## Model
> 
> d.lm <- lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados )
> 
> 
> 
> ### Anova like in the book
> d.aov <- aov( y ~  x1*x2*x3 - x1:x2:x3, data = Dados )
> #### SSR (fitted Model) = 128.296
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Douglas Bates wrote:
> 
>> On 1/24/06, Lance Westerhoff <lance at quantumbioinc.com> wrote:
>>  
>>
>>> Hi-
>>>
>>> On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>>>
>>>    
>>>
>>>> Lance Westerhoff <lance at quantumbioinc.com> writes:
>>>>
>>>>      
>>>>
>>>>> Hello All-
>>>>>
>>>>> I found an inconsistency between the R-squared reported in Excel vs.
>>>>> that in R, and I am wondering which (if any) may be correct and if
>>>>> this is a known issue.  While it certainly wouldn't surprise me if
>>>>> Excel is just flat out wrong, I just want to make sure since the R-
>>>>> squared reported in R seems surprisingly high.  Please let me know if
>>>>> this is the wrong list.  Thanks!
>>>>>        
>>>>>
>>>> Excel is flat out wrong. As the name implies, R-squared values cannot
>>>> be less than zero (adjusted R-squared can, but I wouldn't think
>>>> that is what Excel does).
>>>>      
>>>>
>>> I had thought the same thing, but then I came across the following
>>> site which states: "Note that it is possible to get a negative R-
>>> square for equations that do not contain a constant term. If R-square
>>> is defined as the proportion of variance explained by the fit, and if
>>> the fit is actually worse than just fitting a horizontal line, then R-
>>> square is negative. In this case, R-square cannot be interpreted as
>>> the square of a correlation." Since
>>>
>>> R^2 = 1 - (SSE/SST)
>>>
>>> I guess you can have SSE > SST which would result in a R^2 of less
>>> then 1.0.  However, it still seems very strange which made me wonder
>>> what is going on in Excel needless to say!
>>>
>>> http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
>>> ch_fitt9.html
>>>    
>>>
>> This seems to be a case of using the wrong formula.  R^2 should
>> measure the amount of variation for which the given model accounts
>> relative to the amount of variation for which the *appropriate* null
>> model does not account.  If you have a constant or intercept term in a
>> linear model then the null model for comparison is one with the
>> intercept only.  If you have a linear model without an intercept term
>> then the appropriate null model for comparison is the model that
>> predicts all the responses as zero.  Thus SST, the "corrected" total
>> sum of squares, should be used when you have a model with an intercept
>> term but the uncorrected total sum of squares should be used when you
>> do not have an intercept term.
>>
>> It is disappointing to see the MathWorks propagating such an
>> elementary misconception.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>> .
>>
>>  
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list