[R-sig-ME] standard error and statistical significance in lmer versus lm
Vincenzo Lagani
vlagani at ics.forth.gr
Fri Jan 2 19:06:32 CET 2015
Dear prof. Bolker,
thanks a lot for your answer. I have not been able to observe any
definitive pattern by following your suggestion:
One thing that might be interesting would be plotting the residuals from
a model of only probeset variation (i.e., values ~ (week|probeset))
and seeing how the week*genotype pattern was (hopefully) clarified.
Could you please be a little bit more specific about how the residuals
from the model you suggest should be plotted?
Here the summary of the model, for yours and other readers perusal:
>lme4Model.random <- lmer(values ~ (week|probesets), data = dataset, REML = FALSE);
>summary(lme4Model.random)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: values ~ (week | probesets)
Data: dataset
AIC BIC logLik deviance df.resid
172535.9 172590.6 -86263.0 172525.9 414340
Scaled residuals:
Min 1Q Median 3Q Max
-11.2119 -0.5763 -0.0112 0.5812 21.3413
Random effects:
Groups Name Variance Std.Dev. Corr
probesets (Intercept) 5.166933 2.27309
week 0.004412 0.06642 -0.25
Residual 0.061336 0.24766
Number of obs: 414345, groups: probesets, 18015
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.64697 0.01662 400
Thanks again,
Vincenzo
On 12/27/2014 4:33 AM, Ben Bolker wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 14-12-26 02:11 PM, Vincenzo Lagani wrote:
>> Dear all,
>>
>> I am modelling gene expression data with mixed models using lme4.
>> My goal is to assess whether gene expression globally decreases or
>> increases with time.
>>
>> Specifically, the data consist in whole expression profiles
>> measured at five different time points in two different strains of
>> mice. For each time point and each strain there are three
>> replicates. The data are not longitudinal, i.e., different mice are
>> used at each time point. We had to remove one profile from a time
>> point because it was not matching our quality criteria, so the data
>> became not properly balanced.
>>
>> On these data I am fitting the following model:
>>
>> lme4Model.full <- lmer(values ~ week * genotype + (week |
>> probesets), data = dataset, REML = FALSE)
>>
>> where 'values' stands for the gene expression, 'week' is a scaled
>> numeric reporting the age of the mice in weeks, and 'genotype' is a
>> factor with two levels representing the two different genetic
>> backgrounds. Each single gene is let having its own random
>> intercept and slope.
>>
>> What puzzles me is that when I compare this model with its simpler,
>> non-mixed version:
>>
>> lmModel.full <- lm(values ~ week * genotype, data = dataset)
>>
>> I obtain the same coefficients but different standard errors (see
>> below). Furthermore, while the interaction coefficient is not
>> significant in the simple linear model, it becomes highly
>> significant in the mixed model, at least according to these ANOVA
>> tests:
>>
>> lmModel <- lm(values ~ week + genotype, data = dataset) lme4Model
>> <- lmer(values ~ week + genotype + (week | probesets), data =
>> dataset, REML = FALSE)
>>
>>> anova(lmModel.full, lmModel)
>> Analysis of Variance Table
>>
>> Model 1: values ~ week * genotype Model 2: values ~ week +
>> genotype Res.Df RSS Df Sum of Sq F Pr(>F) 1 414341
>> 2162664 2 414342 2162666 -1 -2.4697 0.4732 0.4915
>>
>>> anova(lme4Model.full, lme4Model)
>> Data: dataset Models: lme4Model: values ~ week + genotype + (week |
>> probesets) lme4Model.full: values ~ week * genotype + (week |
>> probesets) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
>> lme4Model 7 72376 72452 -36181 72362 lme4Model.full 8
>> 72325 72413 -36155 72309 52.472 1 4.364e-13 ***
>>
>> Assessing whether the interaction coefficient is significant is
>> actually the aim of my study, and having two totally different
>> answers confuses me. My understanding is that the mixed model
>> better catches the variance structure of the data and thus it is
>> able to better estimate the standard errors and p-values of the
>> coefficients. Is this correct? In other words, can I confidently
>> claim that the p-values obtained from the mixed models are "the
>> correct ones" and that the interaction term is actually
>> significant?
> As far as I can tell from what you've posted, the result given by
> lme4 is indeed (or certainly could be/I have no reason to believe it
> is not) correct. I'm not sure of the best way to explain the result to
> you, though. Adding the variation among probesets to the model does
> indeed explain a lot of variation that would otherwise end up being
> modeled as error and filtering into the standard errors. It would be
> nice to understand the differences by visualizing the different
> models, but with such a large data set it could be challenging ...
> One thing that might be interesting would be plotting the residuals from
> a model of only probeset variation (i.e., values ~ (week|probeset))
> and seeing how the week*genotype pattern was (hopefully) clarified.
>
>> My apologies if this issue has been already posted on this list.
>> Despite having seen multiple posts here on similar topics, I have
>> not been able to find an answer to these questions.
>>
>> Thanks in advance for your help. Any suggestion is very welcome.
>>
>> Regards,
>>
>> Vincenzo
>>
>>
>>> summary(lme4Model.full)
>> Linear mixed model fit by maximum likelihood ['lmerMod'] Formula:
>> values ~ week * genotype + (week | probesets) Data: dataset
>>
>> AIC BIC logLik deviance df.resid 72325.3 72412.8 -36154.7
>> 72309.3 414337
>>
>> Scaled residuals: Min 1Q Median 3Q Max -13.2622
>> -0.5246 0.0128 0.5270 23.3456
>>
>> Random effects: Groups Name Variance Std.Dev. Corr
>> probesets (Intercept) 5.167338 2.27318 week 0.005029 0.07092
>> -0.23 Residual 0.047063 0.21694 Number of obs: 414345,
>> groups: probesets, 18015
>>
>> Fixed effects: Estimate Std. Error t value (Intercept)
>> 6.5727983 0.0169414 388.0 week -0.0151336 0.0006823
>> -22.2 genotypeCSB 0.2405300 0.0007121 337.8
>> week:genotypeCSB 0.0050430 0.0006962 7.2
>>
>> Correlation of Fixed Effects: (Intr) week gntCSB week
>> -0.177 genotypeCSB -0.015 -0.028 wk:gntypCSB -0.001 -0.392 -0.054
>>
>>
>>> summary(lmModel.full)
>> Call: lm(formula = values ~ week * genotype, data = dataset)
>>
>> Residuals: Min 1Q Median 3Q Max -5.7560 -1.9881
>> 0.0083 1.6823 7.6407
>>
>> Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
>> 6.572798 0.004407 1491.384 < 2e-16 *** week
>> -0.015134 0.004547 -3.329 0.000873 *** genotypeCSB
>> 0.240530 0.007500 32.072 < 2e-16 *** week:genotypeCSB
>> 0.005043 0.007331 0.688 0.491537 --- Signif. codes: 0 ‘***’
>> 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 2.285 on 414341 degrees of freedom
>> Multiple R-squared: 0.002491, Adjusted R-squared: 0.002484
>> F-statistic: 344.9 on 3 and 414341 DF, p-value: < 2.2e-16
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQEcBAEBAgAGBQJUniiHAAoJEOCV5YRblxUH5GEH/R57rvnCi7OYj0HEV+1dQ9sv
> awXYPd9/q7t2ZPVyrJ8OdVL2+ntVe7KYKFy28D2uRa5eyyH6/jaoy9nSlGI4Gvd0
> dslGQYlIpSw9LmOHY1BPcQYZuqEoJoHlbonbX+00AwgANdanP0CpSWNzNVRmbcUL
> ftPAErAaJecb7yu56+I2Yz5ugN3NYrqNdWvTV/HYxt5emjx45gQdQd4cQRTfKw3n
> JkcM8DMRrOjvN6w2H8Pgps/yE3W+nx5VsgBaSdmagwTIfke6aZ90+55jMhjbDXNJ
> xsbFzPc3CUA5SUO7qQHwoteMz8QlfRp3D8SgfmdfaPixskWCdMHjxAws+0ePAhw=
> =0jkf
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list