[R-sig-ME] Mixed model (with interaction) for gene expression and iteration

Paolo Innocenti paolo.innocenti at ebc.uu.se
Thu Jun 4 11:38:22 CEST 2009


Sorry for replying to my own emails, but I found partial solution to my 
problems in Pinheiro and Bates at:

2.4.1 - Likelihood Ratio Tests (p.83)
to calculate logLik and p.values for random effects

and

2.4.2 - Hypothesis tests for Fixed-Effect Terms (p.87)
that is self-explanatory. Although I still don't understand how to 
calculate the "denominator degrees of freedom".

Best,
paolo

PS. Off-topic quick question: is there currently a "out-of-the-box" 
solution to get credible intervals for random effects in mcmcsamp()?
HPDinterval (from lme4 package) doesn't seem to work for that?




Paolo Innocenti wrote:
> Dear Douglas, Rolf, Juan and list,
> 
> thank you very much for your replies.
> I now got a good working model, and the use of refit and VarCorr will 
> definitely help.
> 
> I had a go with mcmcsamp(), and I must confirm that this approach is not 
> feasible, both computationally and because if you get a "false 
> convergence" for, say, 1 gene out of 20, it becomes impossible to go 
> back and fix all the errors.
> 
> So, the alternative approach seems more promising. If I understand 
> correctly, you suggest to calculate a p-value for random effects out of 
> the LRT (Likelihood ratio test), and use approximated DFs to calculate 
> standard p-values for the fixed effects.
> 
> I neved used this approach, so I appreciate if you can point me in the 
> right direction.
> 
> Random effects:
> I'd need to compare the this three model:
> m1a <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
> m1b <- lmer(Y1 ~ sex + (1|line))
> m1c <- lmer(Y1 ~ sex)
> 
> and get the effect of the interaction from m1a-m1b,
> and the effect of "line" from m1b-m1c.
> It that correct?
> Can anyone point me to any kind of documentation/examples to sort out 
> the details?
> 
> Fixed effects:
> I don't really know where to get the approximated degrees of freedom. 
> Can you point me to an example?
> 
> Thanks again for all the help. Eventually, I'll be really happy to share 
> my experience/code when everything is sorted, even if I doubt I can add 
> anything helpful.
> 
> Best,
> paolo
> 
> PS. I don't really understand what you mean by
> sobj<-summary(result)
> what object is your "result" here?
> 
> 
> 
> 
> Juan Pedro Steibel wrote:
>> Hello Paolo,
>> We are also starting to use lmer for gene expression analysis 
>> (genetical genomics too) so here are my thoughts.
>>
>> Having two equivalent parameterizations, I would go with the 
>> computationally fastest one (a couple more miliseconds per model, 
>> easily add up to many hours when analyzing highly dimensional 
>> datasets). You can time the analysis for, say, 100 transcripts and go 
>> from there.
>>
>> ~note: other users commented on your models not being equivalent~
>>
>> For p-values:
>>
>> You could use LRT to test for variance components.
>>
>> This is standard practice in genetic epidemiology: fit a model with 
>> and without the random effect in question, then compare the log 
>> (residual) likelihood ratio to a chi-square statistic. FDR can be used 
>> on top of that to account for multiple tests. Of course, now you have 
>> to fit three models (one null models for each VC), so your cpu  time 
>> has just multiplied by almost 3. Definitely using refit and update 
>> will help with compute time when having so many models.
>>
>> I use sobj<-summary(result) as an intermediate step to get the info 
>> (although this may add to the computational burden and other 
>> suggestions you got may be faster and more efficient),
>>
>> then ask for slots:
>> @REmat
>> @coefs
>> ...for getting estimates of variance components and fixed effects.
>>
>> @coefs gives you a t-statistic for fixed effects too... you could take 
>> a stab at an approximated df method and compute an (approximated) 
>> p-value.
>> I know that doing so can attract a lot of criticism in this list, but 
>> when you are fitting a several million models (10000s of transcripts 
>> and 1000s of genomic positions as in my case), the mcmc approximation 
>> is (unfortunately) not computationally feasible.
>>
>>
>> Hope this helps!
>> JP
>>
>>
>>
>> Paolo Innocenti wrote:
>>> Dear Douglas and list,
>>>
>>> I am thinking about fitting a mixed model for a microarray experiment 
>>> using lme4, since other specific software seems not suitable for my 
>>> purposes. I'll briefly describe the model and kindly ask for 
>>> suggestions on the model and the workflow I can use to get useful 
>>> results.
>>>
>>> My response variable Y is gene expression levels for a given gene 
>>> (say g_i) from 120 samples.
>>> The factor I want to include are:
>>>
>>> Sex: fixed, two levels, M/F.
>>> Line: 15 randomly picked genotypes from a large outbred population.
>>>
>>> I am interested in:
>>> - if the gene is differentially expressed in the 2 sexes (effect of 
>>> sex), in the 15 lines (effect of line) and the interaction of the two.
>>>
>>> - the variance component of line = how much of the variance is due to 
>>> the genotype
>>>
>>> - the variance component of the interaction = the genetic variation 
>>> for sexual dimorphism.
>>>
>>> Reading a bit of this mailing list, I came up with these three models:
>>>
>>> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>>>
>>> or
>>>
>>> m2 <- lmer(Y1 ~ sex + (sex|line))
>>>
>>> or
>>>
>>> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>>>
>>> Which should all be the same model (and indeed they have all the same 
>>> residuals) but different parametrization (see self-contained example 
>>> below).
>>>
>>> Now, in the light of my needs (see above), which model makes it 
>>> easier to extract the components I need? Also, do they make different 
>>> assumptions - as different levels of independency among levels of 
>>> random factors?
>>>
>>> I will need to be able to extract the variance component values in an 
>>> iterative process (i have 18.000 genes): is VarCorr() the way to go?
>>>
>>> VarCorr(m1)$'sex:line'[1]
>>> VarCorr(m1)$'line'[1]
>>>
>>> Last two question: what is the easier way to assess, in an iterative 
>>> process, normality of residuals, and what is a sensible way to assess 
>>> significant differential expression of genes (since I guess I can't 
>>> get p-values and then apply FDR correction?)
>>>
>>> Thanks a lot for reading so far and I'll be grateful for any kind of 
>>> help.
>>> paolo
>>>
>>> Self-contained example:
>>>
>>> Y1 <- as.numeric(
>>> c("11.6625","11.3243","11.7819","11.5032","11.7578","11.9379","11.8491",
>>> "11.9035","11.2042","11.0344","11.5137","11.1995","11.6327","11.7392",
>>> "11.9869","11.6955","11.5631","11.7159","11.8435","11.5814","12.0756",
>>> "12.3428","12.3342","11.9883","11.6067","11.6102","11.6517","11.4444",
>>> "11.9567","12.0478","11.9683","11.8207","11.5860","11.6028","11.6522",
>>> "11.6775","12.3570","12.2266","12.2444","12.1369","11.2573","11.4577",
>>> "11.4432","11.2994","11.8486","11.9258","11.9864","11.9964","11.2806",
>>> "11.2527","11.3672","11.0791","11.9501","11.7223","11.9825","11.8114",
>>> "11.6116","11.4284","11.3731","11.6942","12.2153","12.0101","12.2150",
>>> "12.1932","11.5617","11.3761","11.4813","11.7503","11.9889","12.1530",
>>> "12.3299","12.4436","11.4566","11.4614","11.5527","11.3513","11.9764",
>>> "11.8810","12.0999","11.9083","11.4870","11.6764","11.3973","11.4507",
>>> "12.1141","11.9906","12.1118","11.9728","11.3382","11.4146","11.4590",
>>> "11.2527","12.1101","12.0448","12.2191","11.8317","11.3982","11.3555",
>>> "11.3897","11.7731","11.9749","11.8666","12.1984","12.0350","11.4642",
>>> "11.4509","11.5552","11.4346","12.0714","11.7136","11.9019","11.8158",
>>> "11.3132","11.3121","11.1612","11.2073","11.6658","11.7879","11.7847",
>>> "11.5300"))
>>> sex <- factor(rep(c("F","M"), 15, each=4))
>>> line <- factor(rep(1:15, each=8))
>>> m1 <- lmer(Y1 ~ sex + (1|line) + (1|sex:line))
>>> m2 <- lmer(Y1 ~ sex + (sex|line))
>>> m3 <- lmer(Y1 ~ sex + (0 + sex|line))
>>> VarCorr(m1)$'sex:line'[1]
>>> VarCorr(m1)$'line'[1]
>>>
>>> Output:
>>>
>>>>> m1
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (1 | line) + (1 | 
>>>> sex:line)     AIC   BIC logLik deviance REMLdev
>>>>  -91.13 -77.2  50.57   -111.1  -101.1
>>>> Random effects:
>>>>  Groups   Name        Variance  Std.Dev.
>>>>  sex:line (Intercept) 0.0023237 0.048205
>>>>  line     (Intercept) 0.0169393 0.130151
>>>>  Residual             0.0168238 0.129707
>>>> Number of obs: 120, groups: sex:line, 30; line, 15
>>>>
>>>> Fixed effects:
>>>>             Estimate Std. Error t value
>>>> (Intercept) 11.45977    0.03955  289.72
>>>> sexM         0.52992    0.02951   17.96
>>>>
>>>> Correlation of Fixed Effects:
>>>>      (Intr)
>>>> sexM -0.373
>>>>> m2
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (sex | line) 
>>>>  AIC    BIC logLik deviance REMLdev
>>>>  -90 -73.27     51   -112.1    -102
>>>> Random effects:
>>>>  Groups   Name        Variance  Std.Dev. Corr   line     (Intercept) 
>>>> 0.0152993 0.123691                 sexM        0.0046474 0.068172 
>>>> 0.194  Residual             0.0168238 0.129707       Number of obs: 
>>>> 120, groups: line, 15
>>>>
>>>> Fixed effects:
>>>>             Estimate Std. Error t value
>>>> (Intercept) 11.45977    0.03606   317.8
>>>> sexM         0.52992    0.02951    18.0
>>>>
>>>> Correlation of Fixed Effects:
>>>>      (Intr)
>>>> sexM -0.161
>>>>> m3
>>>> Linear mixed model fit by REML Formula: Y1 ~ sex + (0 + sex | line) 
>>>>  AIC    BIC logLik deviance REMLdev
>>>>  -90 -73.27     51   -112.1    -102
>>>> Random effects:
>>>>  Groups   Name Variance Std.Dev. Corr   line     sexF 0.015299 
>>>> 0.12369                  sexM 0.023227 0.15240  0.899  Residual      
>>>> 0.016824 0.12971        Number of obs: 120, groups: line, 15
>>>>
>>>> Fixed effects:
>>>>             Estimate Std. Error t value
>>>> (Intercept) 11.45977    0.03606   317.8
>>>> sexM         0.52991    0.02951    18.0
>>>>
>>>> Correlation of Fixed Effects:
>>>>      (Intr)
>>>> sexM -0.161
>>>
>>>
>>>
>>
>>
> 

-- 
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyvägen 18D
75236 Uppsala, Sweden




More information about the R-sig-mixed-models mailing list