[R] lm coefficients output confusing
Charles C. Berry
cberry at tajo.ucsd.edu
Thu Aug 13 23:48:05 CEST 2009
On Thu, 13 Aug 2009, Ross Culloch wrote:
>
> Hi all,
>
> I have an issue with the lm() function regarding the listing of the
> coefficients. My data are below, showing a list of hours (HR) relating to
> the time spent resting (R) by an individual animal. Simply i want to run a
> lm() to run in an anova() to see if there is a significant difference in
> resting between hours.
The problem is not with lm(), but with your data.
The listing below does not precisely represent your data; when I copy it
to the clipboard and then use
dat <- read.table("clipboard")
lm(R~HR,dat)
I get a different result.
OTOH, when I use
summary(lm(R~as.factor(HR),rdata2))
I recap your results (up to labelling).
I suggest you try
str( rdata2 )
to get more insight into your data. I suspect that one or more of the
values in HR was quoted in your data file or that you used colClasses to
determine the class of data in each column and turned HR into a factor.
HTH,
Chuck
>
> HR R
> 1 2 0.6666667
> 2 2 0.4666667
> 3 2 0.8000000
> 4 2 0.6333333
> 5 2 0.7333333
> 6 2 0.8000000
> 7 2 0.8666667
> 8 2 0.7857143
> 9 2 0.7826087
> 10 2 0.6666667
> 11 2 0.9166667
> 12 2 0.6666667
> 13 3 0.5294118
> 14 3 0.8541667
> 15 3 0.4583333
> 16 3 0.5882353
> 17 3 0.9347826
> 18 3 0.7878788
> 19 3 0.7857143
> 20 3 0.6944444
> 21 3 0.8333333
> 22 3 0.7450980
> 23 3 0.9230769
> 24 3 0.7222222
> 25 4 0.6571429
> 26 4 0.7241379
> 27 4 0.7391304
> 28 4 0.6571429
> 29 4 0.8000000
> 30 4 0.9130435
> 31 4 0.7187500
> 32 4 0.8437500
> 33 4 0.9230769
> 34 4 0.8571429
> 35 4 0.8695652
> 36 4 0.8888889
> 37 5 0.3333333
> 38 5 0.5365854
> 39 5 0.6774194
> 40 5 0.7142857
> 41 5 0.6904762
> 42 5 0.5483871
> 43 5 0.5952381
> 44 5 0.4166667
> 45 5 0.5666667
> 46 5 0.5952381
> 47 5 0.7894737
> 48 5 0.7500000
> 49 6 0.6268657
> 50 6 0.7187500
> 51 6 0.5500000
> 52 6 0.7164179
> 53 6 0.7656250
> 54 6 0.5869565
> 55 6 0.7164179
> 56 6 0.7031250
> 57 6 0.7230769
> 58 6 0.7462687
> 59 6 0.9200000
> 60 6 0.8536585
> 61 7 0.6379310
> 62 7 0.5357143
> 63 7 0.5227273
> 64 7 0.8000000
> 65 7 0.6724138
> 66 7 0.7083333
> 67 7 0.7241379
> 68 7 0.6938776
> 69 7 0.6545455
> 70 7 0.7931034
> 71 7 0.7560976
> 72 7 0.8684211
> 73 8 0.6727273
> 74 8 0.6000000
> 75 8 0.8333333
> 76 8 0.8181818
> 77 8 0.7818182
> 78 8 0.7647059
> 79 8 0.5818182
> 80 8 0.5918367
> 81 8 0.7450980
> 82 8 0.7818182
> 83 8 0.8048780
> 84 8 0.8684211
>
>
> The script i'm using and output is as follows:
>
> > anova(rdayml <- lm(R ~ HR, data=rdata2, na.action=na.exclude))
> Analysis of Variance Table
>
> Response: R
> Df Sum Sq Mean Sq F value Pr(>F)
> HR 6 0.25992 0.04332 3.1762 0.00774 **
> Residuals 77 1.05021 0.01364
> ---
> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
> >
> > summary(rdayml <- lm(R ~ HR,data=rdata2))
>
> Call:
> lm(formula = R ~ HR, data = rdata2)
>
> Residuals:
> Min 1Q Median 3Q Max
> -0.279725 -0.065416 0.005593 0.077486 0.201070
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.732082 0.033713 21.715 <2e-16 ***
> HR3 0.005976 0.047678 0.125 0.9006
> HR4 0.067232 0.047678 1.410 0.1625
> HR5 -0.130935 0.047678 -2.746 0.0075 **
> HR6 -0.013152 0.047678 -0.276 0.7834
> HR7 -0.034807 0.047678 -0.730 0.4676
> HR8 0.004971 0.047678 0.104 0.9172
> ---
> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>
> Residual standard error: 0.1168 on 77 degrees of freedom
> Multiple R-squared: 0.1984, Adjusted R-squared: 0.1359
> F-statistic: 3.176 on 6 and 77 DF, p-value: 0.00774
>
>
> What i really don't understand is why the lm summary lists the hour numbers
> in the coefficient of the lm, as apposed to just reading HR? On top of that
> if R does display the data like this then i don't understand why it omits
> hour 2? If i can get this to work correctly can I use the p value to
> determine which of the hours is significantly different to the others - so
> in this example hour 5 is significantly different? Or is it just a case of
> using the p value from the anova to determine that there is a significant
> difference between hours (in this case) and use a plot to determine which
> hour(s) are likely to be the cause?
>
> Any help or advice would be most useful!
>
> Best wishes,
>
> Ross
>
>
> --
> View this message in context: http://www.nabble.com/lm-coefficients-output-confusing-tp24958398p24958398.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list