[R] Some clarificatins of anova() and summary ()
Tanmoy Talukdar
tanmoy.talukdar at gmail.com
Sun Dec 14 17:09:04 CET 2008
anyone please explain why this happens.. I know this happens when x1
and x2 has different sizes. but here x1 and x2 have same dimension.
On Sun, Dec 14, 2008 at 9:26 PM, Tanmoy Talukdar
<tanmoy.talukdar at gmail.com> wrote:
> running anova() on intact12 and intact 21 gives two different results!!
>
>> anova(intact12)
> Analysis of Variance Table
>
> Response: y
> Df Sum Sq Mean Sq F value Pr(>F)
> x1 1 663.18 663.18 203.065 < 2.2e-16 ***
> x2 1 35.21 35.21 10.781 0.001940 **
> Residuals 47 153.49 3.27
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> anova(intact21)
> Analysis of Variance Table
>
> Response: y
> Df Sum Sq Mean Sq F value Pr(>F)
> x2 1 698.26 698.26 213.8077 <2e-16 ***
> x1 1 0.12 0.12 0.0379 0.8466
> Residuals 47 153.49 3.27
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
> On Sun, Dec 14, 2008 at 8:56 PM, Tanmoy Talukdar
> <tanmoy.talukdar at gmail.com> wrote:
>> Why do you think that running lm() twice on those two models is going
>> to help me? They are identical models and hence we get identical
>> results.The second question is now alright. I had some
>> misunderstanding about it.
>>
>> Please tell me if you can find any "downside " in summary (). I can't find any.
>>
>>
>> i 've edited the code for that replication issue.
>>
>> set.seed(127)
>> n <- 50
>> x1 <- runif(n,1,10)
>> x2 <- x1 + rnorm(n,0,0.5)
>> plot(x1,x2) # x1 and x2 strongly correlated
>> cor(x1,x2)
>> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
>> intact.lm <- lm(y ~ x1 + x2)
>> summary(intact.lm)
>> anova(intact.lm)
>>
>>
>>> summary(intact.lm)
>>
>> Call:
>> lm(formula = y ~ x1 + x2)
>>
>> Residuals:
>> Min 1Q Median 3Q Max
>> -3.4578 -1.1326 0.4551 1.2807 4.8241
>>
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 3.63603 0.61944 5.870 4.23e-07 ***
>> x1 -0.09555 0.49114 -0.195 0.84658
>> x2 1.59384 0.48542 3.283 0.00194 **
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Residual standard error: 1.807 on 47 degrees of freedom
>> Multiple R-squared: 0.8198, Adjusted R-squared: 0.8121
>> F-statistic: 106.9 on 2 and 47 DF, p-value: < 2.2e-16
>>
>>> anova(intact.lm)
>> Analysis of Variance Table
>>
>> Response: y
>> Df Sum Sq Mean Sq F value Pr(>F)
>> x1 1 663.18 663.18 203.065 < 2.2e-16 ***
>> x2 1 35.21 35.21 10.781 0.001940 **
>> Residuals 47 153.49 3.27
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> On Sun, Dec 14, 2008 at 8:26 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>
>>> On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:
>>>
>>>> [sorry for the repost. I forgot to switch off formatting last time]
>>>>
>>>> I have two assignment problems...
>>>>
>>>> I have written this small code for regression with two regressors .
>>>>
>>> For replication purposes, it might be good to set a seed for the random
>>> number generation.
>>>
>>> set.seed(127)
>>>>
>>>> n <- 50
>>>> x1 <- runif(n,1,10)
>>>> x2 <- x1 + rnorm(n,0,0.5)
>>>> plot(x1,x2) # x1 and x2 strongly correlated
>>>> cor(x1,x2)
>>>> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
>>>> intact.lm <- lm(y ~ x1 + x2)
>>>> summary(intact.lm)
>>>> anova(intact.lm)
>>>>
>>> You should also run anova on these models:
>>>
>>> intact21 <- lm(y~x2+x1)
>>> intact12 <- lm(y~x1+x2)
>>>
>>>>
>>>> the questions are
>>>>
>>>> 1.The function summary() is convenient since the result does not
>>>> depend on the order the variables
>>>> are listed in the linear model definition. It has a serious downside
>>>> though which is obvious in this case.
>>>> Are there any signficant variables left?
>>>>
>>>> 2. An anova(intact.lm) table shows how much the second variable
>>>> contributes to the result in
>>>> addition to the first. Is there a variable significant now?Is the
>>>> second variable significant?
>>>
>>> Both anova and summary were in agreement that the P-value for addition of x2
>>> ito a
>>> model that already 1ncluded x1 is 0.0296. One of them uses the t statistic
>>> and the
>>> other used the F statistic. I am not sure where your confusion lies.
>>>
>>> --
>>> David Winsemius
>>>
>>>>
>>>>
>>>> the results i got:
>>>>
>>>>> summary(intact.lm)
>>>>
>>>> Call:
>>>> lm(formula = y ~ x1 + x2)
>>>>
>>>> Residuals:
>>>> Min 1Q Median 3Q Max
>>>> -5.5824 -1.5314 -0.1568 1.4425 5.3374
>>>>
>>>> Coefficients:
>>>> Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) 3.4857 0.9354 3.726 0.000521 ***
>>>> x1 0.2537 0.6117 0.415 0.680191
>>>> x2 1.3517 0.6025 2.244 0.029608 *
>>>> ---
>>>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>> Residual standard error: 2.34 on 47 degrees of freedom
>>>> Multiple R-squared: 0.7483, Adjusted R-squared: 0.7376
>>>> F-statistic: 69.87 on 2 and 47 DF, p-value: 8.315e-15
>>>>
>>>>> anova(intact.lm)
>>>>
>>>> Analysis of Variance Table
>>>>
>>>> Response: y
>>>> Df Sum Sq Mean Sq F value Pr(>F)
>>>> x1 1 737.86 737.86 134.7129 2.11e-15 ***
>>>> x2 1 27.57 27.57 5.0338 0.02961 *
>>>> Residuals 47 257.43 5.48
>>>> ---
>>>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>>
>>>>
>>>> my question is that , i cant see any "serious downside" in using
>>>> summary (). And in the second question I am totally clueless. I need
>>>> your help
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
More information about the R-help
mailing list