[R] summary vs anova
David Winsemius
dwinsemius at comcast.net
Mon Dec 19 16:00:06 CET 2011
On Dec 19, 2011, at 9:09 AM, Brent Pedersen wrote:
> Hi, I'm sure this is simple, but I haven't been able to find this in
> TFM,
> say I have some data in R like this (pasted here:
> http://pastebin.com/raw.php?i=sjS9Zkup):
One of the reason this is not in TFM is that these are questions that
should be available in any first course on regression textbook.
>
>> head(df)
> gender age smokes disease Y
> 1 female 65 ever control 0.18
> 2 female 77 never control 0.12
> 3 male 40 state1 0.11
> 4 female 67 ever control 0.20
> 5 male 63 ever state1 0.16
> 6 female 26 never state1 0.13
>
> where unique(disease) == c("control", "state1", "state2")
> and unique(smokes) == c("ever", "never", "", "current")
>
> I then fit a linear model like:
>
>> model = lm(Y ~ smokes + disease + age + gender, data=df)
>
> And I want to understand the difference between:
>
>> print(summary(model))
> Call:
> lm(formula = Y ~ smokes + disease + age + gender, data = df)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.22311 -0.08108 -0.03483 0.05604 0.46507
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.1206825 0.0521368 2.315 0.0211 *
> smokescurrent 0.0150641 0.0444466 0.339 0.7348
> smokesever 0.0498764 0.0326254 1.529 0.1271
> smokesnever 0.0394109 0.0349142 1.129 0.2597
> diseasestate1 0.0018739 0.0176817 0.106 0.9157
> diseasestate2 -0.0009858 0.0178651 -0.055 0.9560
> age 0.0002841 0.0006290 0.452 0.6518
> gendermale 0.1164889 0.0128748 9.048 <2e-16 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1257 on 397 degrees of freedom
> Multiple R-squared: 0.1933, Adjusted R-squared: 0.1791
> F-statistic: 13.59 on 7 and 397 DF, p-value: 8.975e-16
>
> and:
>
>> anova(model)
> Analysis of Variance Table
>
> Response: Y
> Df Sum Sq Mean Sq F value Pr(>F)
> smokes 3 0.1536 0.05120 3.2397 0.02215 *
> disease 2 0.0129 0.00647 0.4096 0.66420
> age 1 0.0431 0.04310 2.7270 0.09946 .
> gender 1 1.2937 1.29373 81.8634 < 2e-16 ***
> Residuals 397 6.2740 0.01580
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> I understand (hopefully correctly) that anova() tests by adding each
> covariate
> to the model in order it is specified in the formula.
>
> More specific questions are:
All of which are general statistics questions which you are asked to
post in forums or lists that expect such questions ... and not to r-
help.
>
> 1) How do the p-values for smokes* in summary(model) relate to the
> Pr(>F) for smokes in anova
> 2) what do the p-values for each of those smokes* mean exactly?
> 3) the summary above shows the values for diseasestate1 and
> diseasestate2
> how can I get the p-value for diseasecontrol? (or, e.g.
> genderfemale)
>
>
> ^^^^^^^^^^^^^^^^^^^^^^^
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
-------------------
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list