[R-sig-ME] Mixed model (with interaction) for gene expression and iteration
Juan Pedro Steibel
steibelj at msu.edu
Wed Jun 3 19:04:55 CEST 2009
Hello Paolo,
We are also starting to use lmer for gene expression analysis (genetical
genomics too) so here are my thoughts.
Having two equivalent parameterizations, I would go with the
computationally fastest one (a couple more miliseconds per model, easily
add up to many hours when analyzing highly dimensional datasets). You
can time the analysis for, say, 100 transcripts and go from there.
~note: other users commented on your models not being equivalent~
For p-values:
You could use LRT to test for variance components.
This is standard practice in genetic epidemiology: fit a model with and
without the random effect in question, then compare the log (residual)
likelihood ratio to a chi-square statistic. FDR can be used on top of
that to account for multiple tests. Of course, now you have to fit three
models (one null models for each VC), so your cpu time has just
multiplied by almost 3. Definitely using refit and update will help with
compute time when having so many models.
I use sobj<-summary(result) as an intermediate step to get the info
(although this may add to the computational burden and other suggestions
you got may be faster and more efficient),
then ask for slots:
@REmat
@coefs
...for getting estimates of variance components and fixed effects.
@coefs gives you a t-statistic for fixed effects too... you could take a
stab at an approximated df method and compute an (approximated) p-value.
I know that doing so can attract a lot of criticism in this list, but
when you are fitting a several million models (10000s of transcripts and
1000s of genomic positions as in my case), the mcmc approximation is
(unfortunately) not computationally feasible.
Hope this helps!
JP
Paolo Innocenti wrote:
> Dear Douglas and list,
>
> I am thinking about fitting a mixed model for a microarray experiment
> using lme4, since other specific software seems not suitable for my
> purposes. I'll briefly describe the model and kindly ask for
> suggestions on the model and the workflow I can use to get useful
> results.
>
> My response variable Y is gene expression levels for a given gene (say
> g_i) from 120 samples.
> The factor I want to include are:
>
> Sex: fixed, two levels, M/F.
> Line: 15 randomly picked genotypes from a large outbred population.
>
> I am interested in:
> - if the gene is differentially expressed in the 2 sexes (effect of
> sex), in the 15 lines (effect of line) and the interaction of the two.
>
> - the variance component of line = how much of the variance is due to
> the genotype
>
> - the variance component of the interaction = the genetic variation
> for sexual dimorphism.
>
> Reading a bit of this mailing list, I came up with these three models:
>
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>
> or
>
> m2 <- lmer(Y1 ~ sex + (sex|line))
>
> or
>
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>
> Which should all be the same model (and indeed they have all the same
> residuals) but different parametrization (see self-contained example
> below).
>
> Now, in the light of my needs (see above), which model makes it easier
> to extract the components I need? Also, do they make different
> assumptions - as different levels of independency among levels of
> random factors?
>
> I will need to be able to extract the variance component values in an
> iterative process (i have 18.000 genes): is VarCorr() the way to go?
>
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Last two question: what is the easier way to assess, in an iterative
> process, normality of residuals, and what is a sensible way to assess
> significant differential expression of genes (since I guess I can't
> get p-values and then apply FDR correction?)
>
> Thanks a lot for reading so far and I'll be grateful for any kind of
> help.
> paolo
>
> Self-contained example:
>
> Y1 <- as.numeric(
> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
> "11.5300"))
> sex <- factor(rep(c("F","M"), 15, each=4))
> line <- factor(rep(1:15, each=8))
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m2 <- lmer(Y1 ~ sex + (sex|line))
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Output:
>
>>> m1
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 |
>> sex:line) AIC BIC logLik deviance REMLdev
>> -91.13 -77.2 50.57 -111.1 -101.1
>> Random effects:
>> Groups Name Variance Std.Dev.
>> sex:line (Intercept) 0.0023237 0.048205
>> line (Intercept) 0.0169393 0.130151
>> Residual 0.0168238 0.129707
>> Number of obs: 120, groups: sex:line, 30; line, 15
>>
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) 11.45977 0.03955 289.72
>> sexM 0.52992 0.02951 17.96
>>
>> Correlation of Fixed Effects:
>> (Intr)
>> sexM -0.373
>>> m2
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line)
>> AIC BIC logLik deviance REMLdev
>> -90 -73.27 51 -112.1 -102
>> Random effects:
>> Groups Name Variance Std.Dev. Corr line (Intercept)
>> 0.0152993 0.123691 sexM 0.0046474 0.068172
>> 0.194 Residual 0.0168238 0.129707 Number of obs:
>> 120, groups: line, 15
>>
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) 11.45977 0.03606 317.8
>> sexM 0.52992 0.02951 18.0
>>
>> Correlation of Fixed Effects:
>> (Intr)
>> sexM -0.161
>>> m3
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line)
>> AIC BIC logLik deviance REMLdev
>> -90 -73.27 51 -112.1 -102
>> Random effects:
>> Groups Name Variance Std.Dev. Corr line sexF 0.015299
>> 0.12369 sexM 0.023227 0.15240 0.899 Residual
>> 0.016824 0.12971 Number of obs: 120, groups: line, 15
>>
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) 11.45977 0.03606 317.8
>> sexM 0.52991 0.02951 18.0
>>
>> Correlation of Fixed Effects:
>> (Intr)
>> sexM -0.161
>
>
>
--
=============================
Juan Pedro Steibel
Assistant Professor
Statistical Genetics and Genomics
Department of Animal Science &
Department of Fisheries and Wildlife
Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA
Phone: 1-517-353-5102
E-mail: steibelj at msu.edu
More information about the R-sig-mixed-models
mailing list