[R-sig-ME] standard error and statistical significance in lmer versus lm

Fri Jan 2 19:06:32 CET 2015

Dear prof. Bolker,

thanks a lot for your answer. I have not been able to observe any 
definitive pattern by following your suggestion:

One thing that might be interesting would be plotting the residuals from
a model of only probeset variation (i.e., values ~ (week|probeset))
and seeing how the week*genotype pattern was (hopefully) clarified.

Could you please be a little bit more specific about how the residuals 
from the model you suggest should be plotted?

Here the summary of the model, for yours and other readers perusal:

>lme4Model.random <- lmer(values ~ (week|probesets), data = dataset, REML = FALSE);

>summary(lme4Model.random)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: values ~ (week | probesets)
    Data: dataset

      AIC      BIC   logLik deviance df.resid
172535.9 172590.6 -86263.0 172525.9   414340

Scaled residuals:
      Min       1Q   Median       3Q      Max
-11.2119  -0.5763  -0.0112   0.5812  21.3413

Random effects:
  Groups    Name        Variance Std.Dev. Corr
  probesets (Intercept) 5.166933 2.27309
            week        0.004412 0.06642  -0.25
  Residual              0.061336 0.24766
Number of obs: 414345, groups:  probesets, 18015

Fixed effects:
             Estimate Std. Error t value
(Intercept)  6.64697    0.01662     400

Thanks again,

Vincenzo

On 12/27/2014 4:33 AM, Ben Bolker wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 14-12-26 02:11 PM, Vincenzo Lagani wrote:
>> Dear all,
>>
>> I am modelling gene expression data with mixed models using lme4.
>> My goal is to assess whether gene expression globally decreases or
>>   increases with time.
>>
>> Specifically, the data consist in whole expression profiles
>> measured at five different time points in two different strains of
>> mice. For each time point and each strain there are three
>> replicates. The data are not longitudinal, i.e., different mice are
>> used at each time point. We had to remove one profile from a time
>> point because it was not matching our quality criteria, so the data
>> became not properly balanced.
>>
>> On these data I am fitting the following model:
>>
>> lme4Model.full <- lmer(values ~ week * genotype + (week |
>> probesets), data = dataset, REML = FALSE)
>>
>> where 'values' stands for the gene expression, 'week' is a scaled
>> numeric reporting the age of the mice in weeks, and 'genotype' is a
>>   factor with two levels representing the two different genetic
>> backgrounds. Each single gene is let having its own random
>> intercept and slope.
>>
>> What puzzles me is that when I compare this model with its simpler,
>>   non-mixed version:
>>
>> lmModel.full <- lm(values ~ week * genotype, data = dataset)
>>
>> I obtain the same coefficients but different standard errors (see
>> below). Furthermore, while the interaction coefficient is not
>> significant in the simple linear model, it becomes highly
>> significant in the mixed model, at least according to these ANOVA
>> tests:
>>
>> lmModel <- lm(values ~ week + genotype, data = dataset) lme4Model
>> <- lmer(values ~ week + genotype + (week | probesets), data =
>> dataset, REML = FALSE)
>>
>>> anova(lmModel.full, lmModel)
>> Analysis of Variance Table
>>
>> Model 1: values ~ week * genotype Model 2: values ~ week +
>> genotype Res.Df     RSS Df Sum of Sq      F Pr(>F) 1 414341
>> 2162664 2 414342 2162666 -1   -2.4697 0.4732 0.4915
>>
>>> anova(lme4Model.full, lme4Model)
>> Data: dataset Models: lme4Model: values ~ week + genotype + (week |
>> probesets) lme4Model.full: values ~ week * genotype + (week |
>> probesets) Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
>> lme4Model       7 72376 72452 -36181    72362 lme4Model.full  8
>> 72325 72413 -36155    72309 52.472      1  4.364e-13 ***
>>
>> Assessing whether the interaction coefficient is significant is
>> actually the aim of my study, and having two totally different
>> answers confuses me. My understanding is that the mixed model
>> better catches the variance structure of the data and thus it is
>> able to better estimate the standard errors and p-values of the
>> coefficients. Is this correct? In other words, can I confidently
>> claim that the p-values obtained from the mixed models are "the
>> correct ones" and that the interaction term is actually
>> significant?
>     As far as I can tell from what you've posted, the result given by
> lme4 is indeed (or certainly could be/I have no reason to believe it
> is not) correct. I'm not sure of the best way to explain the result to
> you, though.  Adding the variation among probesets to the model does
> indeed explain a lot of variation that would otherwise end up being
> modeled as error and filtering into the standard errors.  It would be
> nice to understand the differences by visualizing the different
> models, but with such a large data set it could be challenging ...
> One thing that might be interesting would be plotting the residuals from
> a model of only probeset variation (i.e., values ~ (week|probeset))
> and seeing how the week*genotype pattern was (hopefully) clarified.
>
>> My apologies if this issue has been already posted on this list.
>> Despite having seen multiple posts here on similar topics, I have
>> not been able to find an answer to these questions.
>>
>> Thanks in advance for your help. Any suggestion is very welcome.
>>
>> Regards,
>>
>> Vincenzo
>>
>>
>>> summary(lme4Model.full)
>> Linear mixed model fit by maximum likelihood  ['lmerMod'] Formula:
>> values ~ week * genotype + (week | probesets) Data: dataset
>>
>> AIC      BIC   logLik deviance df.resid 72325.3  72412.8 -36154.7
>> 72309.3   414337
>>
>> Scaled residuals: Min       1Q   Median       3Q      Max -13.2622
>> -0.5246   0.0128   0.5270  23.3456
>>
>> Random effects: Groups    Name        Variance Std.Dev. Corr
>> probesets (Intercept) 5.167338 2.27318 week        0.005029 0.07092
>> -0.23 Residual              0.047063 0.21694 Number of obs: 414345,
>> groups:  probesets, 18015
>>
>> Fixed effects: Estimate Std. Error t value (Intercept)
>> 6.5727983  0.0169414   388.0 week             -0.0151336  0.0006823
>> -22.2 genotypeCSB       0.2405300  0.0007121   337.8
>> week:genotypeCSB  0.0050430  0.0006962     7.2
>>
>> Correlation of Fixed Effects: (Intr) week   gntCSB week
>> -0.177 genotypeCSB -0.015 -0.028 wk:gntypCSB -0.001 -0.392 -0.054
>>
>>
>>> summary(lmModel.full)
>> Call: lm(formula = values ~ week * genotype, data = dataset)
>>
>> Residuals: Min      1Q  Median      3Q     Max -5.7560 -1.9881
>> 0.0083  1.6823  7.6407
>>
>> Coefficients: Estimate Std. Error  t value Pr(>|t|) (Intercept)
>> 6.572798   0.004407 1491.384  < 2e-16 *** week
>> -0.015134   0.004547   -3.329 0.000873 *** genotypeCSB
>> 0.240530   0.007500   32.072  < 2e-16 *** week:genotypeCSB
>> 0.005043   0.007331    0.688 0.491537 --- Signif. codes:  0 ‘***’
>> 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 2.285 on 414341 degrees of freedom
>> Multiple R-squared:  0.002491,	Adjusted R-squared:  0.002484
>> F-statistic: 344.9 on 3 and 414341 DF,  p-value: < 2.2e-16
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQEcBAEBAgAGBQJUniiHAAoJEOCV5YRblxUH5GEH/R57rvnCi7OYj0HEV+1dQ9sv
> awXYPd9/q7t2ZPVyrJ8OdVL2+ntVe7KYKFy28D2uRa5eyyH6/jaoy9nSlGI4Gvd0
> dslGQYlIpSw9LmOHY1BPcQYZuqEoJoHlbonbX+00AwgANdanP0CpSWNzNVRmbcUL
> ftPAErAaJecb7yu56+I2Yz5ugN3NYrqNdWvTV/HYxt5emjx45gQdQd4cQRTfKw3n
> JkcM8DMRrOjvN6w2H8Pgps/yE3W+nx5VsgBaSdmagwTIfke6aZ90+55jMhjbDXNJ
> xsbFzPc3CUA5SUO7qQHwoteMz8QlfRp3D8SgfmdfaPixskWCdMHjxAws+0ePAhw=
> =0jkf
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]