# [R] Some clarificatins of anova() and summary ()

David Winsemius dwinsemius at comcast.net
Sun Dec 14 15:56:33 CET 2008

```On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:

> [sorry for the repost. I forgot to switch off formatting last time]
>
> I have two assignment problems...
>
> I have written this small code for regression with two regressors .
>
For replication purposes, it might be good to set a seed for the random
number generation.

set.seed(127)
> n <- 50
> x1 <- runif(n,1,10)
> x2 <- x1 + rnorm(n,0,0.5)
> plot(x1,x2) # x1 and x2 strongly correlated
> cor(x1,x2)
> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
> intact.lm <- lm(y ~ x1 + x2)
> summary(intact.lm)
> anova(intact.lm)
>
You should also run anova on these models:

intact21 <- lm(y~x2+x1)
intact12 <- lm(y~x1+x2)

>
> the questions are
>
> 1.The function summary() is convenient since the result does not
> depend on the order the variables
> are listed in the linear model definition. It has a serious downside
> though which is obvious in this case.
> Are there any signficant variables left?
>
> 2. An anova(intact.lm) table shows how much the second variable
> contributes to the result in
> addition to the first. Is there a variable significant now?Is the
> second variable significant?

Both anova and summary were in agreement that the P-value for addition
of x2 ito a
model that already 1ncluded x1 is 0.0296. One of them uses the t
statistic and the
other used the F statistic. I am not sure where your confusion lies.

--
David Winsemius

>
>
> the results i got:
>
>> summary(intact.lm)
>
> Call:
> lm(formula = y ~ x1 + x2)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -5.5824 -1.5314 -0.1568  1.4425  5.3374
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   3.4857     0.9354   3.726 0.000521 ***
> x1            0.2537     0.6117   0.415 0.680191
> x2            1.3517     0.6025   2.244 0.029608 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 2.34 on 47 degrees of freedom
> Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7376
> F-statistic: 69.87 on 2 and 47 DF,  p-value: 8.315e-15
>
>> anova(intact.lm)
> Analysis of Variance Table
>
> Response: y
>          Df Sum Sq Mean Sq  F value   Pr(>F)
> x1         1 737.86  737.86 134.7129 2.11e-15 ***
> x2         1  27.57   27.57   5.0338  0.02961 *
> Residuals 47 257.43    5.48
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> my question is that , i cant see any "serious downside" in using
> summary (). And in the second question I am totally clueless. I need