[R-sig-ME] Mixed model (with interaction) for gene expression and iteration
Paolo Innocenti
paolo.innocenti at ebc.uu.se
Thu Jun 4 11:38:22 CEST 2009
Sorry for replying to my own emails, but I found partial solution to my
problems in Pinheiro and Bates at:
2.4.1 - Likelihood Ratio Tests (p.83)
to calculate logLik and p.values for random effects
and
2.4.2 - Hypothesis tests for Fixed-Effect Terms (p.87)
that is self-explanatory. Although I still don't understand how to
calculate the "denominator degrees of freedom".
Best,
paolo
PS. Off-topic quick question: is there currently a "out-of-the-box"
solution to get credible intervals for random effects in mcmcsamp()?
HPDinterval (from lme4 package) doesn't seem to work for that?
Paolo Innocenti wrote:
> Dear Douglas, Rolf, Juan and list,
>
> thank you very much for your replies.
> I now got a good working model, and the use of refit and VarCorr will
> definitely help.
>
> I had a go with mcmcsamp(), and I must confirm that this approach is not
> feasible, both computationally and because if you get a "false
> convergence" for, say, 1 gene out of 20, it becomes impossible to go
> back and fix all the errors.
>
> So, the alternative approach seems more promising. If I understand
> correctly, you suggest to calculate a p-value for random effects out of
> the LRT (Likelihood ratio test), and use approximated DFs to calculate
> standard p-values for the fixed effects.
>
> I neved used this approach, so I appreciate if you can point me in the
> right direction.
>
> Random effects:
> I'd need to compare the this three model:
> m1a <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m1b <- lmer(Y1 ~ sex + (1|line))
> m1c <- lmer(Y1 ~ sex)
>
> and get the effect of the interaction from m1a-m1b,
> and the effect of "line" from m1b-m1c.
> It that correct?
> Can anyone point me to any kind of documentation/examples to sort out
> the details?
>
> Fixed effects:
> I don't really know where to get the approximated degrees of freedom.
> Can you point me to an example?
>
> Thanks again for all the help. Eventually, I'll be really happy to share
> my experience/code when everything is sorted, even if I doubt I can add
> anything helpful.
>
> Best,
> paolo
>
> PS. I don't really understand what you mean by
> sobj<-summary(result)
> what object is your "result" here?
>
>
>
>
> Juan Pedro Steibel wrote:
>> Hello Paolo,
>> We are also starting to use lmer for gene expression analysis
>> (genetical genomics too) so here are my thoughts.
>>
>> Having two equivalent parameterizations, I would go with the
>> computationally fastest one (a couple more miliseconds per model,
>> easily add up to many hours when analyzing highly dimensional
>> datasets). You can time the analysis for, say, 100 transcripts and go
>> from there.
>>
>> ~note: other users commented on your models not being equivalent~
>>
>> For p-values:
>>
>> You could use LRT to test for variance components.
>>
>> This is standard practice in genetic epidemiology: fit a model with
>> and without the random effect in question, then compare the log
>> (residual) likelihood ratio to a chi-square statistic. FDR can be used
>> on top of that to account for multiple tests. Of course, now you have
>> to fit three models (one null models for each VC), so your cpu time
>> has just multiplied by almost 3. Definitely using refit and update
>> will help with compute time when having so many models.
>>
>> I use sobj<-summary(result) as an intermediate step to get the info
>> (although this may add to the computational burden and other
>> suggestions you got may be faster and more efficient),
>>
>> then ask for slots:
>> @REmat
>> @coefs
>> ...for getting estimates of variance components and fixed effects.
>>
>> @coefs gives you a t-statistic for fixed effects too... you could take
>> a stab at an approximated df method and compute an (approximated)
>> p-value.
>> I know that doing so can attract a lot of criticism in this list, but
>> when you are fitting a several million models (10000s of transcripts
>> and 1000s of genomic positions as in my case), the mcmc approximation
>> is (unfortunately) not computationally feasible.
>>
>>
>> Hope this helps!
>> JP
>>
>>
>>
>> Paolo Innocenti wrote:
>>> Dear Douglas and list,
>>>
>>> I am thinking about fitting a mixed model for a microarray experiment
>>> using lme4, since other specific software seems not suitable for my
>>> purposes. I'll briefly describe the model and kindly ask for
>>> suggestions on the model and the workflow I can use to get useful
>>> results.
>>>
>>> My response variable Y is gene expression levels for a given gene
>>> (say g_i) from 120 samples.
>>> The factor I want to include are:
>>>
>>> Sex: fixed, two levels, M/F.
>>> Line: 15 randomly picked genotypes from a large outbred population.
>>>
>>> I am interested in:
>>> - if the gene is differentially expressed in the 2 sexes (effect of
>>> sex), in the 15 lines (effect of line) and the interaction of the two.
>>>
>>> - the variance component of line = how much of the variance is due to
>>> the genotype
>>>
>>> - the variance component of the interaction = the genetic variation
>>> for sexual dimorphism.
>>>
>>> Reading a bit of this mailing list, I came up with these three models:
>>>
>>> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>>>
>>> or
>>>
>>> m2 <- lmer(Y1 ~ sex + (sex|line))
>>>
>>> or
>>>
>>> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>>>
>>> Which should all be the same model (and indeed they have all the same
>>> residuals) but different parametrization (see self-contained example
>>> below).
>>>
>>> Now, in the light of my needs (see above), which model makes it
>>> easier to extract the components I need? Also, do they make different
>>> assumptions - as different levels of independency among levels of
>>> random factors?
>>>
>>> I will need to be able to extract the variance component values in an
>>> iterative process (i have 18.000 genes): is VarCorr() the way to go?
>>>
>>> VarCorr(m1)$'sex:line'[1]
>>> VarCorr(m1)$'line'[1]
>>>
>>> Last two question: what is the easier way to assess, in an iterative
>>> process, normality of residuals, and what is a sensible way to assess
>>> significant differential expression of genes (since I guess I can't
>>> get p-values and then apply FDR correction?)
>>>
>>> Thanks a lot for reading so far and I'll be grateful for any kind of
>>> help.
>>> paolo
>>>
>>> Self-contained example:
>>>
>>> Y1 <- as.numeric(
>>> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
>>> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
>>> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
>>> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
>>> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
>>> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
>>> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
>>> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
>>> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
>>> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
>>> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
>>> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
>>> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
>>> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
>>> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
>>> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
>>> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
>>> "11.5300"))
>>> sex <- factor(rep(c("F","M"), 15, each=4))
>>> line <- factor(rep(1:15, each=8))
>>> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>>> m2 <- lmer(Y1 ~ sex + (sex|line))
>>> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>>> VarCorr(m1)$'sex:line'[1]
>>> VarCorr(m1)$'line'[1]
>>>
>>> Output:
>>>
>>>>> m1
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 |
>>>> sex:line) AIC BIC logLik deviance REMLdev
>>>> -91.13 -77.2 50.57 -111.1 -101.1
>>>> Random effects:
>>>> Groups Name Variance Std.Dev.
>>>> sex:line (Intercept) 0.0023237 0.048205
>>>> line (Intercept) 0.0169393 0.130151
>>>> Residual 0.0168238 0.129707
>>>> Number of obs: 120, groups: sex:line, 30; line, 15
>>>>
>>>> Fixed effects:
>>>> Estimate Std. Error t value
>>>> (Intercept) 11.45977 0.03955 289.72
>>>> sexM 0.52992 0.02951 17.96
>>>>
>>>> Correlation of Fixed Effects:
>>>> (Intr)
>>>> sexM -0.373
>>>>> m2
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line)
>>>> AIC BIC logLik deviance REMLdev
>>>> -90 -73.27 51 -112.1 -102
>>>> Random effects:
>>>> Groups Name Variance Std.Dev. Corr line (Intercept)
>>>> 0.0152993 0.123691 sexM 0.0046474 0.068172
>>>> 0.194 Residual 0.0168238 0.129707 Number of obs:
>>>> 120, groups: line, 15
>>>>
>>>> Fixed effects:
>>>> Estimate Std. Error t value
>>>> (Intercept) 11.45977 0.03606 317.8
>>>> sexM 0.52992 0.02951 18.0
>>>>
>>>> Correlation of Fixed Effects:
>>>> (Intr)
>>>> sexM -0.161
>>>>> m3
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line)
>>>> AIC BIC logLik deviance REMLdev
>>>> -90 -73.27 51 -112.1 -102
>>>> Random effects:
>>>> Groups Name Variance Std.Dev. Corr line sexF 0.015299
>>>> 0.12369 sexM 0.023227 0.15240 0.899 Residual
>>>> 0.016824 0.12971 Number of obs: 120, groups: line, 15
>>>>
>>>> Fixed effects:
>>>> Estimate Std. Error t value
>>>> (Intercept) 11.45977 0.03606 317.8
>>>> sexM 0.52991 0.02951 18.0
>>>>
>>>> Correlation of Fixed Effects:
>>>> (Intr)
>>>> sexM -0.161
>>>
>>>
>>>
>>
>>
>
--
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyvägen 18D
75236 Uppsala, Sweden
More information about the R-sig-mixed-models
mailing list