[R] lm coefficients output confusing

Thu Aug 13 23:48:05 CEST 2009

On Thu, 13 Aug 2009, Ross Culloch wrote:

> 
> Hi all,
> 
> I have an issue with the lm() function regarding the listing of the
> coefficients. My data are below, showing a list of hours (HR) relating to
> the time spent resting (R) by an individual animal. Simply i want to run a
> lm() to run in an anova() to see if there is a significant difference in
> resting between hours.

The problem is not with lm(), but with your data.

The listing below does not precisely represent your data; when I copy it 
to the clipboard and then use

 	dat <- read.table("clipboard")
 	lm(R~HR,dat)

I get a different result.

OTOH, when I use

 	summary(lm(R~as.factor(HR),rdata2))

I recap your results (up to labelling).

I suggest you try

 	str( rdata2 )

to get more insight into your data. I suspect that one or more of the 
values in HR was quoted in your data file or that you used colClasses to 
determine the class of data in each column and turned HR into a factor.

HTH,

Chuck

>
>    HR         R
> 1   2 0.6666667
> 2   2 0.4666667
> 3   2 0.8000000
> 4   2 0.6333333
> 5   2 0.7333333
> 6   2 0.8000000
> 7   2 0.8666667
> 8   2 0.7857143
> 9   2 0.7826087
> 10  2 0.6666667
> 11  2 0.9166667
> 12  2 0.6666667
> 13  3 0.5294118
> 14  3 0.8541667
> 15  3 0.4583333
> 16  3 0.5882353
> 17  3 0.9347826
> 18  3 0.7878788
> 19  3 0.7857143
> 20  3 0.6944444
> 21  3 0.8333333
> 22  3 0.7450980
> 23  3 0.9230769
> 24  3 0.7222222
> 25  4 0.6571429
> 26  4 0.7241379
> 27  4 0.7391304
> 28  4 0.6571429
> 29  4 0.8000000
> 30  4 0.9130435
> 31  4 0.7187500
> 32  4 0.8437500
> 33  4 0.9230769
> 34  4 0.8571429
> 35  4 0.8695652
> 36  4 0.8888889
> 37  5 0.3333333
> 38  5 0.5365854
> 39  5 0.6774194
> 40  5 0.7142857
> 41  5 0.6904762
> 42  5 0.5483871
> 43  5 0.5952381
> 44  5 0.4166667
> 45  5 0.5666667
> 46  5 0.5952381
> 47  5 0.7894737
> 48  5 0.7500000
> 49  6 0.6268657
> 50  6 0.7187500
> 51  6 0.5500000
> 52  6 0.7164179
> 53  6 0.7656250
> 54  6 0.5869565
> 55  6 0.7164179
> 56  6 0.7031250
> 57  6 0.7230769
> 58  6 0.7462687
> 59  6 0.9200000
> 60  6 0.8536585
> 61  7 0.6379310
> 62  7 0.5357143
> 63  7 0.5227273
> 64  7 0.8000000
> 65  7 0.6724138
> 66  7 0.7083333
> 67  7 0.7241379
> 68  7 0.6938776
> 69  7 0.6545455
> 70  7 0.7931034
> 71  7 0.7560976
> 72  7 0.8684211
> 73  8 0.6727273
> 74  8 0.6000000
> 75  8 0.8333333
> 76  8 0.8181818
> 77  8 0.7818182
> 78  8 0.7647059
> 79  8 0.5818182
> 80  8 0.5918367
> 81  8 0.7450980
> 82  8 0.7818182
> 83  8 0.8048780
> 84  8 0.8684211
> 
> 
> The script i'm using and output is as follows:
> 
> > anova(rdayml <- lm(R ~ HR, data=rdata2, na.action=na.exclude)) 
> Analysis of Variance Table
> 
> Response: R
>           Df  Sum Sq Mean Sq F value  Pr(>F) 
> HR         6 0.25992 0.04332  3.1762 0.00774 **
> Residuals 77 1.05021 0.01364 
> ---
> Signif. codes:  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1 
> > 
> > summary(rdayml <- lm(R ~ HR,data=rdata2))
> 
> Call:
> lm(formula = R ~ HR, data = rdata2)
> 
> Residuals:
>       Min        1Q    Median        3Q       Max 
> -0.279725 -0.065416  0.005593  0.077486  0.201070 
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|) 
> (Intercept)  0.732082   0.033713  21.715   <2e-16 ***
> HR3          0.005976   0.047678   0.125   0.9006 
> HR4          0.067232   0.047678   1.410   0.1625 
> HR5         -0.130935   0.047678  -2.746   0.0075 ** 
> HR6         -0.013152   0.047678  -0.276   0.7834 
> HR7         -0.034807   0.047678  -0.730   0.4676 
> HR8          0.004971   0.047678   0.104   0.9172 
> ---
> Signif. codes:  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1 
> 
> Residual standard error: 0.1168 on 77 degrees of freedom
> Multiple R-squared: 0.1984,     Adjusted R-squared: 0.1359 
> F-statistic: 3.176 on 6 and 77 DF,  p-value: 0.00774 
> 
> 
> What i really don't understand is why the lm summary lists the hour numbers
> in the coefficient of the lm, as apposed to just reading HR? On top of that
> if R does display the data like this then i don't understand why it omits
> hour 2? If i can get this to work correctly can I use the p value to
> determine which of the hours is significantly different to the others - so
> in this example hour 5 is significantly different? Or is it just a case of
> using the p value from the anova to determine that there is a significant
> difference between hours (in this case) and use a plot to determine which
> hour(s) are likely to be the cause?
> 
> Any help or advice would be most useful!
> 
> Best wishes,
> 
> Ross
> 
> 
> -- 
> View this message in context: http://www.nabble.com/lm-coefficients-output-confusing-tp24958398p24958398.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901