[R-sig-ME] Mixed model (with interaction) for gene expression and iteration

Paolo Innocenti paolo.innocenti at ebc.uu.se
Tue Jun 2 14:53:42 CEST 2009


Dear Douglas and list,

I am thinking about fitting a mixed model for a microarray experiment 
using lme4, since other specific software seems not suitable for my 
purposes. I'll briefly describe the model and kindly ask for suggestions 
on the model and the workflow I can use to get useful results.

My response variable Y is gene expression levels for a given gene (say 
g_i) from 120 samples.
The factor I want to include are:

Sex: fixed, two levels, M/F.
Line: 15 randomly picked genotypes from a large outbred population.

I am interested in:
- if the gene is differentially expressed in the 2 sexes (effect of 
sex), in the 15 lines (effect of line) and the interaction of the two.

- the variance component of line = how much of the variance is due to 
the genotype

- the variance component of the interaction = the genetic variation for 
sexual dimorphism.

Reading a bit of this mailing list, I came up with these three models:

m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))

or

m2 <- lmer(Y1 ~ sex + (sex|line))

or

m3 <- lmer(Y1 ~ sex + (0 + sex|line))

Which should all be the same model (and indeed they have all the same 
residuals) but different parametrization (see self-contained example 
below).

Now, in the light of my needs (see above), which model makes it easier 
to extract the components I need? Also, do they make different 
assumptions - as different levels of independency among levels of random 
factors?

I will need to be able to extract the variance component values in an 
iterative process (i have 18.000 genes): is VarCorr() the way to go?

VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Last two question: what is the easier way to assess, in an iterative 
process, normality of residuals, and what is a sensible way to assess 
significant differential expression of genes (since I guess I can't get 
p-values and then apply FDR correction?)

Thanks a lot for reading so far and I'll be grateful for any kind of help.
paolo

Self-contained example:

Y1 <- as.numeric(
c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
"11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
"11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
"12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
"11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
"11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
"11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
"11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
"11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
"12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
"12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
"11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
"12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
"11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
"11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
"11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
"11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
"11.5300"))
sex <- factor(rep(c("F","M"), 15, each=4))
line <- factor(rep(1:15, each=8))
m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
m2 <- lmer(Y1 ~ sex + (sex|line))
m3 <- lmer(Y1 ~ sex + (0 + sex|line))
VarCorr(m1)$'sex:line'[1]
VarCorr(m1)$'line'[1]

Output:

>> m1
> Linear mixed model fit by REML 
> Formula: Y1 ~ sex + (1 | line) + (1 | sex:line) 
>     AIC   BIC logLik deviance REMLdev
>  -91.13 -77.2  50.57   -111.1  -101.1
> Random effects:
>  Groups   Name        Variance  Std.Dev.
>  sex:line (Intercept) 0.0023237 0.048205
>  line     (Intercept) 0.0169393 0.130151
>  Residual             0.0168238 0.129707
> Number of obs: 120, groups: sex:line, 30; line, 15
> 
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept) 11.45977    0.03955  289.72
> sexM         0.52992    0.02951   17.96
> 
> Correlation of Fixed Effects:
>      (Intr)
> sexM -0.373
>> m2
> Linear mixed model fit by REML 
> Formula: Y1 ~ sex + (sex | line) 
>  AIC    BIC logLik deviance REMLdev
>  -90 -73.27     51   -112.1    -102
> Random effects:
>  Groups   Name        Variance  Std.Dev. Corr  
>  line     (Intercept) 0.0152993 0.123691       
>           sexM        0.0046474 0.068172 0.194 
>  Residual             0.0168238 0.129707       
> Number of obs: 120, groups: line, 15
> 
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept) 11.45977    0.03606   317.8
> sexM         0.52992    0.02951    18.0
> 
> Correlation of Fixed Effects:
>      (Intr)
> sexM -0.161
>> m3
> Linear mixed model fit by REML 
> Formula: Y1 ~ sex + (0 + sex | line) 
>  AIC    BIC logLik deviance REMLdev
>  -90 -73.27     51   -112.1    -102
> Random effects:
>  Groups   Name Variance Std.Dev. Corr  
>  line     sexF 0.015299 0.12369        
>           sexM 0.023227 0.15240  0.899 
>  Residual      0.016824 0.12971        
> Number of obs: 120, groups: line, 15
> 
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept) 11.45977    0.03606   317.8
> sexM         0.52991    0.02951    18.0
> 
> Correlation of Fixed Effects:
>      (Intr)
> sexM -0.161



-- 
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyvägen 18D
75236 Uppsala, Sweden




More information about the R-sig-mixed-models mailing list