[R] NEWBIE: Help explaining use of lm()?
Zembower, Kevin
kzembowe at jhuccp.org
Tue Nov 21 20:46:11 CET 2006
I'm attempting the heruclean task of teaching myself Introductory
Statistics and R at the same time. I'm working through Peter Dalgaard's
Introductory Statistics with R, but don't understand why the answer to
one of the exercises works. I'm hoping someone will have the patience to
explain the answer to me, both in the statistics and R areas.
Exercise 6.1 says:
The zelazo data are in the form of a list of vectors, one for each of
the four groups. Convert the data to a form suitable for the use of lm,
and calculate the relevant tests. ...
This stumped me right from the beginning. I thought I understood that
linear models tried to correlate an independent variable (such as the
amount of a hormone administered) to a dependent variable (such as the
height of a cornstalk). Its output was a model that could state, "for
every 10% increase in the hormone, the height increased by X%."
The zelazo data are the ages at walking (in months) of four groups of
infants, two controls and two experimentals subjected to different
exercise regimens. I don't understand why lm() can be used at all in
this circumstance. My initial attempt was to use t.test(), which the
answer key does also. I would have never thought to use lm() except for
the requirement in the problem. I've pasted in the output of the
exercise below, for those without the dataset. Would someone explain why
lm() is appropriate to use in this situation, and what the results mean
'in plain English?'
Thanks for your patience with a newbie. I'm comfortable asking this
question to this group because of the patience and understanding this
group has almost always shown in explaining and teaching statistics.
Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland 21202
410-659-6139
============================================================
> library("ISwR")
Loading required package: survival
Loading required package: splines
Attaching package: 'ISwR'
The following object(s) are masked from package:survival :
lung
> data(zelazo)
> head(zelazo)
$active
[1] 9.00 9.50 9.75 10.00 13.00 9.50
$passive
[1] 11.00 10.00 10.00 11.75 10.50 15.00
$none
[1] 11.50 12.00 9.00 11.50 13.25 13.00
$ctr.8w
[1] 13.25 11.50 12.00 13.50 11.50
> walk <- unlist(zelazo)
> walk
active1 active2 active3 active4 active5 active6 passive1 passive2
9.00 9.50 9.75 10.00 13.00 9.50 11.00 10.00
passive3 passive4 passive5 passive6 none1 none2 none3 none4
10.00 11.75 10.50 15.00 11.50 12.00 9.00 11.50
none5 none6 ctr.8w1 ctr.8w2 ctr.8w3 ctr.8w4 ctr.8w5
13.25 13.00 13.25 11.50 12.00 13.50 11.50
> group <- factor(rep(1:4,c(6,6,6,5)), labels=names(zelazo))
> group
[1] active active active active active active passive passive
passive
[10] passive passive passive none none none none none
none
[19] ctr.8w ctr.8w ctr.8w ctr.8w ctr.8w
Levels: active passive none ctr.8w
> summary(lm(walk ~ group))
Call:
lm(formula = walk ~ group)
Residuals:
Min 1Q Median 3Q Max
-2.7083 -0.8500 -0.3500 0.6375 3.6250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.1250 0.6191 16.355 1.19e-12 ***
grouppassive 1.2500 0.8755 1.428 0.1696
groupnone 1.5833 0.8755 1.809 0.0864 .
groupctr.8w 2.2250 0.9182 2.423 0.0255 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.516 on 19 degrees of freedom
Multiple R-Squared: 0.2528, Adjusted R-squared: 0.1348
F-statistic: 2.142 on 3 and 19 DF, p-value: 0.1285
>
More information about the R-help
mailing list