[R] how lm behaves
Peter Palenchar
peter.palenchar at villanova.edu
Thu Jun 7 21:24:50 CEST 2012
I was wondering if somebody could explain why I get different results here:
>treats[,2]<-as.factor(treats[,2])
>treats[,5]<-as.factor(treats[,5])
>treats[,4]<-as.factor(treats[,4])
#there are 'c' on more days than I have 'h2o2', where treats[,4] is the day. I only want 'c' that correspond to the same days that I have a 'h2o2' also.
>z<-treats[,3] == 'h2o2'
>x<-treats[,4] %in% treats[z,4]
>a<-treats[,3] == 'c'
>aa<-which(a)
>xx<-which(x)
>zz<-which(z)
>aa<-intersect(aa, xx)
>aa<-c(aa, zz)
>a<- count[aa]
>x<-as.vector(treats[aa,2])
>y<-as.vector(treats[aa,4])
>b<-as.vector(treats[aa,5])
>data1<-cbind(a,x,y,b)
>data1<-as.data.frame(data1)
>data1[,'a']<-as.integer(levels(data1[,'a'])[data1[,'a']])
>mo2<-lm(count[aa]~treats[aa,2]*treats[aa,4]*treats[aa,5]- treats[aa,2]:treats[aa,4]:treats[aa,5])
>summary(mo2)
Call:
lm(formula = count[aa] ~ treats[aa, 2] * treats[aa, 4] * treats[aa,
5] - treats[aa, 2]:treats[aa, 4]:treats[aa, 5])
Residuals:
Min 1Q Median 3Q Max
-70.000 -22.244 0.422 17.292 70.000
Coefficients: (13 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.955e+02 4.038e+01 9.792 1.77e-09 ***
treats[aa, 2]11 1.600e+00 4.860e+01 0.033 0.974034
treats[aa, 2]12 -8.200e+01 4.860e+01 -1.687 0.105692
treats[aa, 4]5.15 -2.279e+02 5.303e+01 -4.298 0.000292 ***
treats[aa, 4]5.2 -5.033e+01 5.303e+01 -0.949 0.352838
treats[aa, 4]5.21 2.111e+01 5.303e+01 0.398 0.694384
treats[aa, 4]5.29 -4.922e+01 5.303e+01 -0.928 0.363360
treats[aa, 4]6.11 1.016e+01 5.941e+01 0.171 0.865787
treats[aa, 4]6.17 -9.518e+01 5.941e+01 -1.602 0.123390
treats[aa, 4]6.18 5.566e+01 5.941e+01 0.937 0.358971
treats[aa, 4]6.5 5.249e+01 5.941e+01 0.884 0.386458
treats[aa, 5]5.7 -8.988e-14 4.860e+01 0.000 1.000000
treats[aa, 5]38 -2.554e+02 4.860e+01 -5.255 2.85e-05 ***
treats[aa, 5]570 -4.009e+02 5.031e+01 -7.969 6.29e-08 ***
treats[aa, 2]11:treats[aa, 4]5.15 -4.100e+01 5.809e+01 -0.706 0.487713
treats[aa, 2]12:treats[aa, 4]5.15 1.297e+02 5.809e+01 2.232 0.036103 *
treats[aa, 2]11:treats[aa, 4]5.2 -6.300e+01 5.809e+01 -1.085 0.289869
treats[aa, 2]12:treats[aa, 4]5.2 2.740e+02 5.809e+01 4.717 0.000105 ***
treats[aa, 2]11:treats[aa, 4]5.21 5.667e+00 5.809e+01 0.098 0.923172
treats[aa, 2]12:treats[aa, 4]5.21 1.170e+02 5.809e+01 2.014 0.056382 .
treats[aa, 2]11:treats[aa, 4]5.29 -1.647e+02 5.809e+01 -2.835 0.009643 **
treats[aa, 2]12:treats[aa, 4]5.29 -7.667e+00 5.809e+01 -0.132 0.896199
treats[aa, 2]11:treats[aa, 4]6.11 6.577e+01 7.433e+01 0.885 0.385801
treats[aa, 2]12:treats[aa, 4]6.11 4.775e+01 7.433e+01 0.642 0.527269
treats[aa, 2]11:treats[aa, 4]6.17 3.627e+01 7.433e+01 0.488 0.630377
treats[aa, 2]12:treats[aa, 4]6.17 2.725e+01 7.433e+01 0.367 0.717427
treats[aa, 2]11:treats[aa, 4]6.18 -9.073e+01 7.433e+01 -1.221 0.235193
treats[aa, 2]12:treats[aa, 4]6.18 -1.553e+02 7.433e+01 -2.089 0.048534 *
treats[aa, 2]11:treats[aa, 4]6.5 -1.257e+02 7.433e+01 -1.691 0.104888
treats[aa, 2]12:treats[aa, 4]6.5 -1.507e+02 7.433e+01 -2.028 0.054838 .
treats[aa, 2]11:treats[aa, 5]5.7 -1.840e+01 4.500e+01 -0.409 0.686546
treats[aa, 2]12:treats[aa, 5]5.7 -5.960e+01 4.500e+01 -1.325 0.198909
treats[aa, 2]11:treats[aa, 5]38 9.560e+01 4.500e+01 2.125 0.045092 *
treats[aa, 2]12:treats[aa, 5]38 2.860e+01 4.500e+01 0.636 0.531583
treats[aa, 2]11:treats[aa, 5]570 9.525e+01 5.031e+01 1.893 0.071534 .
treats[aa, 2]12:treats[aa, 5]570 2.255e+02 5.031e+01 4.483 0.000186 ***
treats[aa, 4]5.15:treats[aa, 5]5.7 8.767e+01 5.809e+01 1.509 0.145483
treats[aa, 4]5.2:treats[aa, 5]5.7 4.333e+00 5.809e+01 0.075 0.941209
treats[aa, 4]5.21:treats[aa, 5]5.7 4.200e+01 5.809e+01 0.723 0.477281
treats[aa, 4]5.29:treats[aa, 5]5.7 -5.700e+01 5.809e+01 -0.981 0.337138
treats[aa, 4]6.11:treats[aa, 5]5.7 NA NA NA NA
treats[aa, 4]6.17:treats[aa, 5]5.7 NA NA NA NA
treats[aa, 4]6.18:treats[aa, 5]5.7 NA NA NA NA
treats[aa, 4]6.5:treats[aa, 5]5.7 NA NA NA NA
treats[aa, 4]5.15:treats[aa, 5]38 9.500e+01 5.809e+01 1.635 0.116190
treats[aa, 4]5.2:treats[aa, 5]38 -2.433e+01 5.809e+01 -0.419 0.679354
treats[aa, 4]5.21:treats[aa, 5]38 -9.633e+01 5.809e+01 -1.658 0.111434
treats[aa, 4]5.29:treats[aa, 5]38 2.067e+01 5.809e+01 0.356 0.725398
treats[aa, 4]6.11:treats[aa, 5]38 NA NA NA NA
treats[aa, 4]6.17:treats[aa, 5]38 NA NA NA NA
treats[aa, 4]6.18:treats[aa, 5]38 NA NA NA NA
treats[aa, 4]6.5:treats[aa, 5]38 NA NA NA NA
treats[aa, 4]5.15:treats[aa, 5]570 NA NA NA NA
treats[aa, 4]5.2:treats[aa, 5]570 NA NA NA NA
treats[aa, 4]5.21:treats[aa, 5]570 NA NA NA NA
treats[aa, 4]5.29:treats[aa, 5]570 NA NA NA NA
treats[aa, 4]6.11:treats[aa, 5]570 -5.433e+01 5.809e+01 -0.935 0.359766
treats[aa, 4]6.17:treats[aa, 5]570 9.433e+01 5.809e+01 1.624 0.118632
treats[aa, 4]6.18:treats[aa, 5]570 -4.333e+00 5.809e+01 -0.075 0.941209
treats[aa, 4]6.5:treats[aa, 5]570 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 50.31 on 22 degrees of freedom
Multiple R-squared: 0.9655, Adjusted R-squared: 0.8934
F-statistic: 13.39 on 46 and 22 DF, p-value: 7.819e-09
>mo<-lm(a~x*y*b-x:y:b,data = data1)
>summary(mo)
Call:
lm(formula = a ~ x * y * b - x:y:b, data = data1)
Residuals:
Min 1Q Median 3Q Max
-70.000 -22.244 0.422 17.292 70.000
Coefficients: (13 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 397.067 40.385 9.832 1.64e-09 ***
x12 -83.600 48.601 -1.720 0.099446 .
x9 -1.600 48.601 -0.033 0.974034
y5.15 -268.889 53.028 -5.071 4.44e-05 ***
y5.2 -113.333 53.028 -2.137 0.043944 *
y5.21 26.778 53.028 0.505 0.618598
y5.29 -213.889 53.028 -4.034 0.000556 ***
y6.11 75.933 59.406 1.278 0.214494
y6.17 -58.900 59.406 -0.991 0.332227
y6.18 -35.067 59.406 -0.590 0.561009
y6.5 -73.233 59.406 -1.233 0.230673
b38 -159.800 48.601 -3.288 0.003356 **
b5.7 -18.400 48.601 -0.379 0.708619
b570 -305.667 50.307 -6.076 4.08e-06 ***
x12:y5.15 170.667 58.089 2.938 0.007610 **
x9:y5.15 41.000 58.089 0.706 0.487713
x12:y5.2 337.000 58.089 5.801 7.76e-06 ***
x9:y5.2 63.000 58.089 1.085 0.289869
x12:y5.21 111.333 58.089 1.917 0.068371 .
x9:y5.21 -5.667 58.089 -0.098 0.923172
x12:y5.29 157.000 58.089 2.703 0.012998 *
x9:y5.29 164.667 58.089 2.835 0.009643 **
x12:y6.11 -18.025 74.334 -0.242 0.810649
x9:y6.11 -65.775 74.334 -0.885 0.385801
x12:y6.17 -9.025 74.334 -0.121 0.904467
x9:y6.17 -36.275 74.334 -0.488 0.630377
x12:y6.18 -64.525 74.334 -0.868 0.394741
x9:y6.18 90.725 74.334 1.221 0.235193
x12:y6.5 -25.025 74.334 -0.337 0.739566
x9:y6.5 125.725 74.334 1.691 0.104888
x12:b38 -67.000 44.996 -1.489 0.150676
x9:b38 -95.600 44.996 -2.125 0.045092 *
x12:b5.7 -41.200 44.996 -0.916 0.369782
x9:b5.7 18.400 44.996 0.409 0.686546
x12:b570 130.250 50.307 2.589 0.016744 *
x9:b570 -95.250 50.307 -1.893 0.071534 .
y5.15:b38 95.000 58.089 1.635 0.116190
y5.2:b38 -24.333 58.089 -0.419 0.679354
y5.21:b38 -96.333 58.089 -1.658 0.111434
y5.29:b38 20.667 58.089 0.356 0.725398
y6.11:b38 NA NA NA NA
y6.17:b38 NA NA NA NA
y6.18:b38 NA NA NA NA
y6.5:b38 NA NA NA NA
y5.15:b5.7 87.667 58.089 1.509 0.145483
y5.2:b5.7 4.333 58.089 0.075 0.941209
y5.21:b5.7 42.000 58.089 0.723 0.477281
y5.29:b5.7 -57.000 58.089 -0.981 0.337138
y6.11:b5.7 NA NA NA NA
y6.17:b5.7 NA NA NA NA
y6.18:b5.7 NA NA NA NA
y6.5:b5.7 NA NA NA NA
y5.15:b570 NA NA NA NA
y5.2:b570 NA NA NA NA
y5.21:b570 NA NA NA NA
y5.29:b570 NA NA NA NA
y6.11:b570 -54.333 58.089 -0.935 0.359766
y6.17:b570 94.333 58.089 1.624 0.118632
y6.18:b570 -4.333 58.089 -0.075 0.941209
y6.5:b570 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 50.31 on 22 degrees of freedom
Multiple R-squared: 0.9655, Adjusted R-squared: 0.8934
F-statistic: 13.39 on 46 and 22 DF, p-value: 7.819e-09
Why don't I get the same result for mo and mo2?
Does it really matter if the data is in a data frame or not?
As near as I can tell, everything appears to be the same other than that:
> class(count)
[1] "integer"
> class(data1[,'a'])
[1] "integer"
> class(x)
[1] "logical"
> class(treats[,2])
[1] "factor"
> class(count)
[1] "integer"
> class(data1[,'a'])
[1] "integer"
> class(data1[,'x'])
[1] "factor"
> class(treats[,2])
[1] "factor"
> class(data1[,'y'])
[1] "factor"
> class(treats[,4])
[1] "factor"
> class(data1[,'b'])
[1] "factor"
> class(treats[,5])
[1] "factor"
Except that for treats[,4], I have more levels than for data1[,'y']. However, I've only put treats[aa, 4] into the model and treats[aa,4] has the same values as data1[,'y'].
> data1[,'y']==treats[aa,4]
Error in Ops.factor(data1[, "y"], treats[aa, 4]) :
level sets of factors are different
Is the model looking at all the possible levels even though it isn't being given information on them all (i.e. I'm limiting it to treats[aa,4])?
Peter
More information about the R-help
mailing list