[R] NEWBIE: Help explaining use of lm()?

Tue Nov 21 20:46:11 CET 2006

I'm attempting the heruclean task of teaching myself Introductory
Statistics and R at the same time. I'm working through Peter Dalgaard's
Introductory Statistics with R, but don't understand why the answer to
one of the exercises works. I'm hoping someone will have the patience to
explain the answer to me, both in the statistics and R areas.

Exercise 6.1 says:
The zelazo data are in the form of a list of vectors, one for each of
the four groups. Convert the data to a form suitable for the use of lm,
and calculate the relevant tests. ...

This stumped me right from the beginning. I thought I understood that
linear models tried to correlate an independent variable (such as the
amount of a hormone administered) to a dependent variable (such as the
height of a cornstalk). Its output was a model that could state, "for
every 10% increase in the hormone, the height increased by X%."

The zelazo data are the ages at walking (in months) of four groups of
infants, two controls and two experimentals subjected to different
exercise regimens. I don't understand why lm() can be used at all in
this circumstance. My initial attempt was to use t.test(), which the
answer key does also. I would have never thought to use lm() except for
the requirement in the problem. I've pasted in the output of the
exercise below, for those without the dataset. Would someone explain why
lm() is appropriate to use in this situation, and what the results mean
'in plain English?'

Thanks for your patience with a newbie. I'm comfortable asking this
question to this group because of the patience and understanding this
group has almost always shown in explaining and teaching statistics.

Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland  21202
410-659-6139 
============================================================
> library("ISwR")
Loading required package: survival
Loading required package: splines

Attaching package: 'ISwR'

        The following object(s) are masked from package:survival :

         lung 

> data(zelazo)
> head(zelazo)
$active
[1]  9.00  9.50  9.75 10.00 13.00  9.50

$passive
[1] 11.00 10.00 10.00 11.75 10.50 15.00

$none
[1] 11.50 12.00  9.00 11.50 13.25 13.00

$ctr.8w
[1] 13.25 11.50 12.00 13.50 11.50

> walk <- unlist(zelazo)
> walk
 active1  active2  active3  active4  active5  active6 passive1 passive2 
    9.00     9.50     9.75    10.00    13.00     9.50    11.00    10.00 
passive3 passive4 passive5 passive6    none1    none2    none3    none4 
   10.00    11.75    10.50    15.00    11.50    12.00     9.00    11.50 
   none5    none6  ctr.8w1  ctr.8w2  ctr.8w3  ctr.8w4  ctr.8w5 
   13.25    13.00    13.25    11.50    12.00    13.50    11.50 
> group <- factor(rep(1:4,c(6,6,6,5)), labels=names(zelazo))
> group
 [1] active  active  active  active  active  active  passive passive
passive
[10] passive passive passive none    none    none    none    none
none   
[19] ctr.8w  ctr.8w  ctr.8w  ctr.8w  ctr.8w 
Levels: active passive none ctr.8w
> summary(lm(walk ~ group))

Call:
lm(formula = walk ~ group)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7083 -0.8500 -0.3500  0.6375  3.6250 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   10.1250     0.6191  16.355 1.19e-12 ***
grouppassive   1.2500     0.8755   1.428   0.1696    
groupnone      1.5833     0.8755   1.809   0.0864 .  
groupctr.8w    2.2250     0.9182   2.423   0.0255 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 1.516 on 19 degrees of freedom
Multiple R-Squared: 0.2528,     Adjusted R-squared: 0.1348 
F-statistic: 2.142 on 3 and 19 DF,  p-value: 0.1285 

>