# [R] Some clarificatins of anova() and summary ()

Tanmoy Talukdar tanmoy.talukdar at gmail.com
Sun Dec 14 15:40:11 CET 2008

```[sorry for the repost. I forgot to switch off formatting last time]

I have two assignment problems...

I have written this small code for regression with two regressors .

n <- 50
x1 <- runif(n,1,10)
x2 <- x1 + rnorm(n,0,0.5)
plot(x1,x2) # x1 and x2 strongly correlated
cor(x1,x2)
y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
intact.lm <- lm(y ~ x1 + x2)
summary(intact.lm)
anova(intact.lm)

the questions are

1.The function summary() is convenient since the result does not
depend on the order the variables
are listed in the linear model definition. It has a serious downside
though which is obvious in this case.
Are there any signficant variables left?

2. An anova(intact.lm) table shows how much the second variable
contributes to the result in
addition to the first. Is there a variable significant now?Is the
second variable significant?

the results i got:

> summary(intact.lm)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
Min      1Q  Median      3Q     Max
-5.5824 -1.5314 -0.1568  1.4425  5.3374

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.4857     0.9354   3.726 0.000521 ***
x1            0.2537     0.6117   0.415 0.680191
x2            1.3517     0.6025   2.244 0.029608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.34 on 47 degrees of freedom
Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7376
F-statistic: 69.87 on 2 and 47 DF,  p-value: 8.315e-15

> anova(intact.lm)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq  F value   Pr(>F)
x1         1 737.86  737.86 134.7129 2.11e-15 ***
x2         1  27.57   27.57   5.0338  0.02961 *
Residuals 47 257.43    5.48
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

my question is that , i cant see any "serious downside" in using
summary (). And in the second question I am totally clueless. I need