[R-sig-ME] need help with mixed effects model

Fri Feb 29 18:29:25 CET 2008

Doug and other mixed-models aficionados,

I have made some progress on my own on the problem I posted in this 
thread. Briefly, I am analyzing a multifactoral genomic experiment and 
wish to look at gene-gene correlations independent of Strain. Because 
multiple measurements are taken per rat, I wish to use lmer. What seems 
to be working is the following.

mod1 <- lmer(gene2 ~ -1 + Strain + (1|Rat) + gene1)
mod2 <- lmer(gene2 ~ -1 + Strain + (1|Rat))
anova.sum <- anova(mod1, mod2)

I look to see if adding the expression of the other gene of interest as 
a covariate significantly improves the model, if it does, then I take 
that as an indicator of gene-gene correlation/dependence.

I am not doing this, of course for just two genes, but build an 
adjacency matrix out of the p-values for all gene-gene interactions in a 
list of about 400 sig. genes. I then adjust the p-values for FDR and 
pick a suitable FDR (0.001 in this case) as a threshold and create 
another adjacency matrix with 1's for significant correlation and 0's 
for non-significant. I then visualize this using Rgraphviz.

As I was tearing my hair out trying to make sure this was sensical, it 
occurred to me that within my list of 400 genes I have positive 
controls. About 40 of the genes are represented by 2 or more probesets, 
which should be highly correlated if they are measuring the same thing. 
So, I subjected just genes with duplicate probesets to the above 
procedure and, sure enough, in an overwhelming number of cases, 
probesets from the same gene plot next to each other.

My conclusion from this exercise is that what I am doing is empirically 
correct, although I am open to suggestions as to how it could be 
improved or comments as to how I may be just plain wrong.

Doug, I am reading your book and appreciate your contributions.

Mark

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com

******************************************************************

Douglas Bates wrote:
> On Fri, Feb 22, 2008 at 11:57 AM, Mark W Kimpel <mwkimpel at gmail.com> wrote:
>> This is my first foray into in mixed models and, while awaiting the
>>  arrival of:
> 
>>  Extending the Linear Model with R: Generalized Linear, Mixed Effects
>>  and     Nonparametric Regression Models
>>  Mixed Effects Models in S and S-Plus
> 
>>  I am in need to some advice.
> 
>>  I would like to look at gene-gene correlations within a multi-factorial,
>>  mixed effects experiment. Here are the factors, with levels:
> 
>>  Gene Expression: 2 different genes per Animal, continuous variable
>>  Animals: 6 per Strain
>>  Tissues: 3 per animal
> 
>> Strain: 2
> 
>>  I thus have 6*3*2 = 36 samples
> 
>>  I do not care, for this analysis, about differences between Tissues,
>>  Strains, or Animals, in fact, I want to control for them while examining
>>  the correlation of expression of the two genes. In other words, I want
>>  look at something very much like the Pearson correlation coefficient
>>  controlled for these other factors.
> 
>>  I guess the first question I should ask is: "is a mixed model the way to
>>  go, and, if not, what would be the correct approach?"
> 
> Perhaps.  How do you plan to incorporate the two genes?
> 
>>  Assuming mixed models will work, as I see it through my newbie eyes,
>>  Tissue and strain are fixed effects and animals are random effects.
> 
> If you were interested in just 1 gene than I would say that this looks
> like a good approach.  I'm just not sure what to do about the multiple
> genes.
> 
>>  Any suggestions for an approach and a model?
> 
> The model specification (assuming that each animal has a distinct
> number) would be something like
> 
> gene1 ~ Tissue * Strain + (1|Animal)
> 
> In your earlier message to the Bioconductor list you had a
> specification that looked like
> 
> gene1 ~ gene2 + ...
> 
> which makes me a little queasy because you are assuming that gene2 is
> "known" relative to the variability in gene1 and most of the time that
> is not a reasonable approach.
>