[R] anova.lm and F-test

SKrishna madzientist at gmail.com
Mon Jul 9 18:36:13 CEST 2012


Dear Peter,

Thank you very much for that excellent answer to a rather stupid question :)
I did not notice that the RSS actually increased for the model with more
parameters and so in this case the F-statistic is negative and therefore a
p-value from the F-distribution is meaningless. But I guess your answer also
clarifies that as long as the F-statistic is in the valid range (>=0),
anova() will calculate it and return a p-value (whether or not the models
are nested).

Best, Suresh


Peter Dalgaard-2 wrote
> 
> On Jul 9, 2012, at 15:40 , Suresh Krishna wrote:
> 
>> 
>> Hello,
>> 
>> Why does anova.lm sometimes return a p-value and at other times  not ? Is
>> it because it recognizes nested models from non-nested ones ?
>> 
>>> x<-seq(1,100,1)
>>> y<-3*x+rnorm(100)
>>> anova(lm(y~x),lm(y~x+I(x^2)),test="F")
>> Analysis of Variance Table
>> 
>> Model 1: y ~ x
>> Model 2: y ~ x + I(x^2)
>>  Res.Df    RSS Df Sum of Sq      F Pr(>F)
>> 1     98 90.449
>> 2     97 90.288  1   0.16117 0.1732 0.6782
>> 
>>> anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
>> Analysis of Variance Table
>> 
>> Model 1: y ~ x
>> Model 2: y ~ I(x^2) + I(x^3)
>>  Res.Df    RSS Df Sum of Sq F Pr(>F)
>> 1     98   90.4
>> 2     97 7345.7  1   -7255.3
>> 
> 
> You have Df and Sum of Sq with opposite sign, so more parameters with a
> worse fit. The models are not nested, so the F test makes no sense. 
> 
> I'd say that the real question is why anova.lm doesn't protest loudly when
> detecting this? One possible answer is that it also misses other
> non-nested cases where the signs do not clash, and warning only in some of
> the incorrect cases could lead the naive user to believe that the other
> ones are OK. E.g. this F test is equally meaningless
> 
>> anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F")
> Analysis of Variance Table
> 
> Model 1: y ~ I(x^4)
> Model 2: y ~ I(x^2) + I(x^3)
>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
> 1     98 186639                                  
> 2     97   7101  1    179538 2452.4 < 2.2e-16 ***
> 
> (Non-nestedness could in principle be determined by checking whether
> cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its
> constituents, but numerical rank determination is a bit error-prone and
> slow, so this was not implemented). 
> 
> 
> -- 
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes@  Priv: PDalgd@
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context: http://r.789695.n4.nabble.com/anova-lm-and-F-test-tp4635845p4635867.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list