[R-sig-ME] standard error and statistical significance in lmer versus lm

Peter Claussen dakotajudo at me.com
Fri Jan 2 16:34:18 CET 2015


Vincenzo,

If I’m reading this correctly, I don’t think you’re comparing the same models, lm vs lmer

Using lm, you have two models
	values ~ week + genotype
	values ~ week * genotype

Using lmer, you have two different models.
	week + genotype + (week | probesets)
	week * genotype + (week | probesets)

If you were to include (week:probesets) in your lm models, and compare
	values ~ week + genotype + week:probesets
	values ~ week * genotype + week:probesets

would that clarify the confusion about the different results?

Peter


> On Dec 26, 2014, at 1:11 PM, Vincenzo Lagani <vlagani at ics.forth.gr> wrote:
> 
> Dear all,
> 
> I am modelling gene expression data with mixed models using lme4. My 
> goal is to assess whether gene expression globally decreases or 
> increases with time.
> 
> Specifically, the data consist in whole expression profiles measured at 
> five different time points in two different strains of mice. For each 
> time point and each strain there are three replicates. The data are not 
> longitudinal, i.e., different mice are used at each time point. We had 
> to remove one profile from a time point because it was not matching our 
> quality criteria, so the data became not properly balanced.
> 
> On these data I am fitting the following model:
> 
> lme4Model.full <- lmer(values ~ week * genotype + (week | probesets), 
> data = dataset, REML = FALSE)
> 
> where 'values' stands for the gene expression, 'week' is a scaled 
> numeric reporting the age of the mice in weeks, and 'genotype' is a 
> factor with two levels representing the two different genetic 
> backgrounds. Each single gene is let having its own random intercept and 
> slope.
> 
> What puzzles me is that when I compare this model with its simpler, 
> non-mixed version:
> 
> lmModel.full <- lm(values ~ week * genotype, data = dataset)
> 
> I obtain the same coefficients but different standard errors (see 
> below). Furthermore, while the interaction coefficient is not 
> significant in the simple linear model, it becomes highly significant in 
> the mixed model, at least according to these ANOVA tests:
> 
> lmModel <- lm(values ~ week + genotype, data = dataset)
> lme4Model <- lmer(values ~ week + genotype + (week | probesets), data = 
> dataset, REML = FALSE)
> 
>> anova(lmModel.full, lmModel)
> Analysis of Variance Table
> 
> Model 1: values ~ week * genotype
> Model 2: values ~ week + genotype
>   Res.Df     RSS Df Sum of Sq      F Pr(>F)
> 1 414341 2162664
> 2 414342 2162666 -1   -2.4697 0.4732 0.4915
> 
>> anova(lme4Model.full, lme4Model)
> Data: dataset
> Models:
> lme4Model: values ~ week + genotype + (week | probesets)
> lme4Model.full: values ~ week * genotype + (week | probesets)
>                Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
> lme4Model       7 72376 72452 -36181    72362
> lme4Model.full  8 72325 72413 -36155    72309 52.472      1  4.364e-13 ***
> 
> 
> 
> Assessing whether the interaction coefficient is significant is actually 
> the aim of my study, and having two totally different answers confuses 
> me. My understanding is that the mixed model better catches the variance 
> structure of the data and thus it is able to better estimate the 
> standard errors and p-values of the coefficients. Is this correct? In 
> other words, can I confidently claim that the p-values obtained from the 
> mixed models are "the correct ones" and that the interaction term is 
> actually significant?
> 
> My apologies if this issue has been already posted on this list. Despite 
> having seen multiple posts here on similar topics, I have not been able 
> to find an answer to these questions.
> 
> Thanks in advance for your help. Any suggestion is very welcome.
> 
> Regards,
> 
> Vincenzo
> 
> 
>> summary(lme4Model.full)
> Linear mixed model fit by maximum likelihood  ['lmerMod']
> Formula: values ~ week * genotype + (week | probesets)
>    Data: dataset
> 
>      AIC      BIC   logLik deviance df.resid
>  72325.3  72412.8 -36154.7  72309.3   414337
> 
> Scaled residuals:
>      Min       1Q   Median       3Q      Max
> -13.2622  -0.5246   0.0128   0.5270  23.3456
> 
> Random effects:
>  Groups    Name        Variance Std.Dev. Corr
>  probesets (Intercept) 5.167338 2.27318
>            week        0.005029 0.07092  -0.23
>  Residual              0.047063 0.21694
> Number of obs: 414345, groups:  probesets, 18015
> 
> Fixed effects:
>                    Estimate Std. Error t value
> (Intercept)       6.5727983  0.0169414   388.0
> week             -0.0151336  0.0006823   -22.2
> genotypeCSB       0.2405300  0.0007121   337.8
> week:genotypeCSB  0.0050430  0.0006962     7.2
> 
> Correlation of Fixed Effects:
>             (Intr) week   gntCSB
> week        -0.177
> genotypeCSB -0.015 -0.028
> wk:gntypCSB -0.001 -0.392 -0.054
> 
> 
>> summary(lmModel.full)
> 
> Call:
> lm(formula = values ~ week * genotype, data = dataset)
> 
> Residuals:
>     Min      1Q  Median      3Q     Max
> -5.7560 -1.9881  0.0083  1.6823  7.6407
> 
> Coefficients:
>                   Estimate Std. Error  t value Pr(>|t|)
> (Intercept)       6.572798   0.004407 1491.384  < 2e-16 ***
> week             -0.015134   0.004547   -3.329 0.000873 ***
> genotypeCSB       0.240530   0.007500   32.072  < 2e-16 ***
> week:genotypeCSB  0.005043   0.007331    0.688 0.491537
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 2.285 on 414341 degrees of freedom
> Multiple R-squared:  0.002491,	Adjusted R-squared:  0.002484
> F-statistic: 344.9 on 3 and 414341 DF,  p-value: < 2.2e-16
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list