[R-sig-ME] Mixed model (with interaction) for gene expression and iteration

Tue Jun 2 21:54:31 CEST 2009

On Tue, Jun 2, 2009 at 7:53 AM, Paolo Innocenti
<paolo.innocenti at ebc.uu.se> wrote:
> Dear Douglas and list,

> I am thinking about fitting a mixed model for a microarray experiment using
> lme4, since other specific software seems not suitable for my purposes. I'll
> briefly describe the model and kindly ask for suggestions on the model and
> the workflow I can use to get useful results.

> My response variable Y is gene expression levels for a given gene (say g_i)
> from 120 samples.
> The factor I want to include are:

> Sex: fixed, two levels, M/F.
> Line: 15 randomly picked genotypes from a large outbred population.

> I am interested in:
> - if the gene is differentially expressed in the 2 sexes (effect of sex), in
> the 15 lines (effect of line) and the interaction of the two.

> - the variance component of line = how much of the variance is due to the
> genotype

> - the variance component of the interaction = the genetic variation for
> sexual dimorphism.

> Reading a bit of this mailing list, I came up with these three models:

> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))

> or

> m2 <- lmer(Y1 ~ sex + (sex|line))

> or

> m3 <- lmer(Y1 ~ sex + (0 + sex|line))

> Which should all be the same model (and indeed they have all the same
> residuals) but different parametrization (see self-contained example below).

Not really.  m2 and m3 are different parameterizations of the same
model but m1 is different.  In models m2 and m3 the random effects for
F and for M are allowed to be correlated within a line.  Thus there
are three parameters determining the variance-covariance of the random
effects.  In model m1 the unconditional distribution of the random
effects for the sex:line interaction is independent of the
distribution of the random effects for line.

> Now, in the light of my needs (see above), which model makes it easier to
> extract the components I need? Also, do they make different assumptions - as
> different levels of independency among levels of random factors?

I would use either m1 or m3 because I find the correlation structure
in m3 easier to understand than that of m2.

> I will need to be able to extract the variance component values in an
> iterative process (i have 18.000 genes): is VarCorr() the way to go?

I think so.  I would definitely look into using refit to update the
fitted model rather than starting from scratch with each of the genes.

I would use the idiom

> unlist(VarCorr(m1))
   sex:line        line
0.002323722 0.016939242

to get the estimated variances.

> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Last two question: what is the easier way to assess, in an iterative
> process, normality of residuals, and what is a sensible way to assess
> significant differential expression of genes (since I guess I can't get
> p-values and then apply FDR correction?)
>
> Thanks a lot for reading so far and I'll be grateful for any kind of help.
> paolo
>
> Self-contained example:
>
> Y1 <- as.numeric(
> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
> "11.5300"))
> sex <- factor(rep(c("F","M"), 15, each=4))
> line <- factor(rep(1:15, each=8))
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m2 <- lmer(Y1 ~ sex + (sex|line))
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Output:
>
>>> m1
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 |
>> sex:line)    AIC   BIC logLik deviance REMLdev
>>  -91.13 -77.2  50.57   -111.1  -101.1
>> Random effects:
>>  Groups   Name        Variance  Std.Dev.
>>  sex:line (Intercept) 0.0023237 0.048205
>>  line     (Intercept) 0.0169393 0.130151
>>  Residual             0.0168238 0.129707
>> Number of obs: 120, groups: sex:line, 30; line, 15
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept) 11.45977    0.03955  289.72
>> sexM         0.52992    0.02951   17.96
>>
>> Correlation of Fixed Effects:
>>     (Intr)
>> sexM -0.373
>>>
>>> m2
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line)  AIC
>>  BIC logLik deviance REMLdev
>>  -90 -73.27     51   -112.1    -102
>> Random effects:
>>  Groups   Name        Variance  Std.Dev. Corr   line     (Intercept)
>> 0.0152993 0.123691                sexM        0.0046474 0.068172 0.194
>>  Residual             0.0168238 0.129707       Number of obs: 120, groups:
>> line, 15
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept) 11.45977    0.03606   317.8
>> sexM         0.52992    0.02951    18.0
>>
>> Correlation of Fixed Effects:
>>     (Intr)
>> sexM -0.161
>>>
>>> m3
>>
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line)  AIC
>>  BIC logLik deviance REMLdev
>>  -90 -73.27     51   -112.1    -102
>> Random effects:
>>  Groups   Name Variance Std.Dev. Corr   line     sexF 0.015299 0.12369
>>              sexM 0.023227 0.15240  0.899  Residual      0.016824 0.12971
>>      Number of obs: 120, groups: line, 15
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept) 11.45977    0.03606   317.8
>> sexM         0.52991    0.02951    18.0
>>
>> Correlation of Fixed Effects:
>>     (Intr)
>> sexM -0.161
>
>
>
> --
> Paolo Innocenti
> Department of Animal Ecology, EBC
> Uppsala University
> Norbyvägen 18D
> 75236 Uppsala, Sweden
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>