[R-sig-ME] standard error and statistical significance in lmer versus lm
Peter Claussen
dakotajudo at me.com
Fri Jan 2 16:34:18 CET 2015
Vincenzo,
If I’m reading this correctly, I don’t think you’re comparing the same models, lm vs lmer
Using lm, you have two models
values ~ week + genotype
values ~ week * genotype
Using lmer, you have two different models.
week + genotype + (week | probesets)
week * genotype + (week | probesets)
If you were to include (week:probesets) in your lm models, and compare
values ~ week + genotype + week:probesets
values ~ week * genotype + week:probesets
would that clarify the confusion about the different results?
Peter
> On Dec 26, 2014, at 1:11 PM, Vincenzo Lagani <vlagani at ics.forth.gr> wrote:
>
> Dear all,
>
> I am modelling gene expression data with mixed models using lme4. My
> goal is to assess whether gene expression globally decreases or
> increases with time.
>
> Specifically, the data consist in whole expression profiles measured at
> five different time points in two different strains of mice. For each
> time point and each strain there are three replicates. The data are not
> longitudinal, i.e., different mice are used at each time point. We had
> to remove one profile from a time point because it was not matching our
> quality criteria, so the data became not properly balanced.
>
> On these data I am fitting the following model:
>
> lme4Model.full <- lmer(values ~ week * genotype + (week | probesets),
> data = dataset, REML = FALSE)
>
> where 'values' stands for the gene expression, 'week' is a scaled
> numeric reporting the age of the mice in weeks, and 'genotype' is a
> factor with two levels representing the two different genetic
> backgrounds. Each single gene is let having its own random intercept and
> slope.
>
> What puzzles me is that when I compare this model with its simpler,
> non-mixed version:
>
> lmModel.full <- lm(values ~ week * genotype, data = dataset)
>
> I obtain the same coefficients but different standard errors (see
> below). Furthermore, while the interaction coefficient is not
> significant in the simple linear model, it becomes highly significant in
> the mixed model, at least according to these ANOVA tests:
>
> lmModel <- lm(values ~ week + genotype, data = dataset)
> lme4Model <- lmer(values ~ week + genotype + (week | probesets), data =
> dataset, REML = FALSE)
>
>> anova(lmModel.full, lmModel)
> Analysis of Variance Table
>
> Model 1: values ~ week * genotype
> Model 2: values ~ week + genotype
> Res.Df RSS Df Sum of Sq F Pr(>F)
> 1 414341 2162664
> 2 414342 2162666 -1 -2.4697 0.4732 0.4915
>
>> anova(lme4Model.full, lme4Model)
> Data: dataset
> Models:
> lme4Model: values ~ week + genotype + (week | probesets)
> lme4Model.full: values ~ week * genotype + (week | probesets)
> Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
> lme4Model 7 72376 72452 -36181 72362
> lme4Model.full 8 72325 72413 -36155 72309 52.472 1 4.364e-13 ***
>
>
>
> Assessing whether the interaction coefficient is significant is actually
> the aim of my study, and having two totally different answers confuses
> me. My understanding is that the mixed model better catches the variance
> structure of the data and thus it is able to better estimate the
> standard errors and p-values of the coefficients. Is this correct? In
> other words, can I confidently claim that the p-values obtained from the
> mixed models are "the correct ones" and that the interaction term is
> actually significant?
>
> My apologies if this issue has been already posted on this list. Despite
> having seen multiple posts here on similar topics, I have not been able
> to find an answer to these questions.
>
> Thanks in advance for your help. Any suggestion is very welcome.
>
> Regards,
>
> Vincenzo
>
>
>> summary(lme4Model.full)
> Linear mixed model fit by maximum likelihood ['lmerMod']
> Formula: values ~ week * genotype + (week | probesets)
> Data: dataset
>
> AIC BIC logLik deviance df.resid
> 72325.3 72412.8 -36154.7 72309.3 414337
>
> Scaled residuals:
> Min 1Q Median 3Q Max
> -13.2622 -0.5246 0.0128 0.5270 23.3456
>
> Random effects:
> Groups Name Variance Std.Dev. Corr
> probesets (Intercept) 5.167338 2.27318
> week 0.005029 0.07092 -0.23
> Residual 0.047063 0.21694
> Number of obs: 414345, groups: probesets, 18015
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 6.5727983 0.0169414 388.0
> week -0.0151336 0.0006823 -22.2
> genotypeCSB 0.2405300 0.0007121 337.8
> week:genotypeCSB 0.0050430 0.0006962 7.2
>
> Correlation of Fixed Effects:
> (Intr) week gntCSB
> week -0.177
> genotypeCSB -0.015 -0.028
> wk:gntypCSB -0.001 -0.392 -0.054
>
>
>> summary(lmModel.full)
>
> Call:
> lm(formula = values ~ week * genotype, data = dataset)
>
> Residuals:
> Min 1Q Median 3Q Max
> -5.7560 -1.9881 0.0083 1.6823 7.6407
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 6.572798 0.004407 1491.384 < 2e-16 ***
> week -0.015134 0.004547 -3.329 0.000873 ***
> genotypeCSB 0.240530 0.007500 32.072 < 2e-16 ***
> week:genotypeCSB 0.005043 0.007331 0.688 0.491537
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 2.285 on 414341 degrees of freedom
> Multiple R-squared: 0.002491, Adjusted R-squared: 0.002484
> F-statistic: 344.9 on 3 and 414341 DF, p-value: < 2.2e-16
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list