[R-sig-ME] Mixed model (with interaction) for gene expression and iteration

Juan Pedro Steibel steibelj at msu.edu
Wed Jun 3 19:04:55 CEST 2009


Hello Paolo,
We are also starting to use lmer for gene expression analysis (genetical 
genomics too) so here are my thoughts.

Having two equivalent parameterizations, I would go with the 
computationally fastest one (a couple more miliseconds per model, easily 
add up to many hours when analyzing highly dimensional datasets). You 
can time the analysis for, say, 100 transcripts and go from there.

~note: other users commented on your models not being equivalent~

For p-values:

You could use LRT to test for variance components.

This is standard practice in genetic epidemiology: fit a model with and 
without the random effect in question, then compare the log (residual) 
likelihood ratio to a chi-square statistic. FDR can be used on top of 
that to account for multiple tests. Of course, now you have to fit three 
models (one null models for each VC), so your cpu  time has just 
multiplied by almost 3. Definitely using refit and update will help with 
compute time when having so many models.

I use sobj<-summary(result) as an intermediate step to get the info 
(although this may add to the computational burden and other suggestions 
you got may be faster and more efficient),

then ask for slots:
@REmat
@coefs
...for getting estimates of variance components and fixed effects.

@coefs gives you a t-statistic for fixed effects too... you could take a 
stab at an approximated df method and compute an (approximated) p-value.
I know that doing so can attract a lot of criticism in this list, but 
when you are fitting a several million models (10000s of transcripts and 
1000s of genomic positions as in my case), the mcmc approximation is 
(unfortunately) not computationally feasible.


Hope this helps!
JP



Paolo Innocenti wrote:
> Dear Douglas and list,
>
> I am thinking about fitting a mixed model for a microarray experiment 
> using lme4, since other specific software seems not suitable for my 
> purposes. I'll briefly describe the model and kindly ask for 
> suggestions on the model and the workflow I can use to get useful 
> results.
>
> My response variable Y is gene expression levels for a given gene (say 
> g_i) from 120 samples.
> The factor I want to include are:
>
> Sex: fixed, two levels, M/F.
> Line: 15 randomly picked genotypes from a large outbred population.
>
> I am interested in:
> - if the gene is differentially expressed in the 2 sexes (effect of 
> sex), in the 15 lines (effect of line) and the interaction of the two.
>
> - the variance component of line = how much of the variance is due to 
> the genotype
>
> - the variance component of the interaction = the genetic variation 
> for sexual dimorphism.
>
> Reading a bit of this mailing list, I came up with these three models:
>
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>
> or
>
> m2 <- lmer(Y1 ~ sex + (sex|line))
>
> or
>
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>
> Which should all be the same model (and indeed they have all the same 
> residuals) but different parametrization (see self-contained example 
> below).
>
> Now, in the light of my needs (see above), which model makes it easier 
> to extract the components I need? Also, do they make different 
> assumptions - as different levels of independency among levels of 
> random factors?
>
> I will need to be able to extract the variance component values in an 
> iterative process (i have 18.000 genes): is VarCorr() the way to go?
>
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Last two question: what is the easier way to assess, in an iterative 
> process, normality of residuals, and what is a sensible way to assess 
> significant differential expression of genes (since I guess I can't 
> get p-values and then apply FDR correction?)
>
> Thanks a lot for reading so far and I'll be grateful for any kind of 
> help.
> paolo
>
> Self-contained example:
>
> Y1 <- as.numeric(
> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
> "11.5300"))
> sex <- factor(rep(c("F","M"), 15, each=4))
> line <- factor(rep(1:15, each=8))
> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m2 <- lmer(Y1 ~ sex + (sex|line))
> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
> VarCorr(m1)$'sex:line'[1]
> VarCorr(m1)$'line'[1]
>
> Output:
>
>>> m1
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 | 
>> sex:line)     AIC   BIC logLik deviance REMLdev
>>  -91.13 -77.2  50.57   -111.1  -101.1
>> Random effects:
>>  Groups   Name        Variance  Std.Dev.
>>  sex:line (Intercept) 0.0023237 0.048205
>>  line     (Intercept) 0.0169393 0.130151
>>  Residual             0.0168238 0.129707
>> Number of obs: 120, groups: sex:line, 30; line, 15
>>
>> Fixed effects:
>>             Estimate Std. Error t value
>> (Intercept) 11.45977    0.03955  289.72
>> sexM         0.52992    0.02951   17.96
>>
>> Correlation of Fixed Effects:
>>      (Intr)
>> sexM -0.373
>>> m2
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) 
>>  AIC    BIC logLik deviance REMLdev
>>  -90 -73.27     51   -112.1    -102
>> Random effects:
>>  Groups   Name        Variance  Std.Dev. Corr   line     (Intercept) 
>> 0.0152993 0.123691                 sexM        0.0046474 0.068172 
>> 0.194  Residual             0.0168238 0.129707       Number of obs: 
>> 120, groups: line, 15
>>
>> Fixed effects:
>>             Estimate Std. Error t value
>> (Intercept) 11.45977    0.03606   317.8
>> sexM         0.52992    0.02951    18.0
>>
>> Correlation of Fixed Effects:
>>      (Intr)
>> sexM -0.161
>>> m3
>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) 
>>  AIC    BIC logLik deviance REMLdev
>>  -90 -73.27     51   -112.1    -102
>> Random effects:
>>  Groups   Name Variance Std.Dev. Corr   line     sexF 0.015299 
>> 0.12369                  sexM 0.023227 0.15240  0.899  Residual      
>> 0.016824 0.12971        Number of obs: 120, groups: line, 15
>>
>> Fixed effects:
>>             Estimate Std. Error t value
>> (Intercept) 11.45977    0.03606   317.8
>> sexM         0.52991    0.02951    18.0
>>
>> Correlation of Fixed Effects:
>>      (Intr)
>> sexM -0.161
>
>
>


-- 
=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & 
Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA 

Phone: 1-517-353-5102
E-mail: steibelj at msu.edu




More information about the R-sig-mixed-models mailing list