[R-sig-ME] need help with mixed effects model

Mon Feb 25 07:17:18 CET 2008

Doug,

What I have been persuing over the past 2 days is this model, which is 
different from what I posted to BioC and similar to yours except it 
includes the second gene:

  mod <- gene2.expression ~ gene1.expression + Strain + (1|Rat))

I understand your concern and I rely on your expertise as to whether 
this above model is tenable, but let me clarify and give background to 
the question I am exploring.

It is very common in gene expression analysis to perform hierarchical 
clustering to see which genes are most closely correlated in regards to 
expression over a number of samples. The problem with clustering is that 
all genes will cluster somewhere, and it if often hard to make sense of 
the result. I wish to assign some level of significance to the 
correlation between the expression of two genes, admittedly not 
independent of one another. If I just had a bunch of rats under no 
experimental conditions and took one sample per rat, I would think it 
would be reasonable to apply the Pearson Correlation approach and look 
at the p value of the slope.

I, however, have not been presented with such a simple experiment and 
recognize that I cannot simply correlate all 36 samples as if they were 
independent and came from the same population. In fact the genes that 
are most interesting are those that are differentially expressed between 
the two strains, and from each rat I have samples from each of 3 brain 
regions, which I know from prior work are correlated within rat but 
different between brain regions. In fact, it would be nice to be able to 
take brain region into account rather than looking it as a random effect 
within animal, but I am pretty sure I run out of degrees of freedom if I 
do that.

So, it is necessary for me to have both genes in the model to accomplish 
what I need to do, and the other factors are, in fact, are there merely 
to improve power to detect gene-gene correlation by accounting for the 
variance induced by these other factors.

So, is this legit or is there a better approach? Thanks so much for yoru 
help.

Mark

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com

******************************************************************

Douglas Bates wrote:
> On Fri, Feb 22, 2008 at 11:57 AM, Mark W Kimpel <mwkimpel at gmail.com> wrote:
>> This is my first foray into in mixed models and, while awaiting the
>>  arrival of:
> 
>>  Extending the Linear Model with R: Generalized Linear, Mixed Effects
>>  and     Nonparametric Regression Models
>>  Mixed Effects Models in S and S-Plus
> 
>>  I am in need to some advice.
> 
>>  I would like to look at gene-gene correlations within a multi-factorial,
>>  mixed effects experiment. Here are the factors, with levels:
> 
>>  Gene Expression: 2 different genes per Animal, continuous variable
>>  Animals: 6 per Strain
>>  Tissues: 3 per animal
> 
>> Strain: 2
> 
>>  I thus have 6*3*2 = 36 samples
> 
>>  I do not care, for this analysis, about differences between Tissues,
>>  Strains, or Animals, in fact, I want to control for them while examining
>>  the correlation of expression of the two genes. In other words, I want
>>  look at something very much like the Pearson correlation coefficient
>>  controlled for these other factors.
> 
>>  I guess the first question I should ask is: "is a mixed model the way to
>>  go, and, if not, what would be the correct approach?"
> 
> Perhaps.  How do you plan to incorporate the two genes?
> 
>>  Assuming mixed models will work, as I see it through my newbie eyes,
>>  Tissue and strain are fixed effects and animals are random effects.
> 
> If you were interested in just 1 gene than I would say that this looks
> like a good approach.  I'm just not sure what to do about the multiple
> genes.
> 
>>  Any suggestions for an approach and a model?
> 
> The model specification (assuming that each animal has a distinct
> number) would be something like
> 
> gene1 ~ Tissue * Strain + (1|Animal)
> 
> In your earlier message to the Bioconductor list you had a
> specification that looked like
> 
> gene1 ~ gene2 + ...
> 
> which makes me a little queasy because you are assuming that gene2 is
> "known" relative to the variability in gene1 and most of the time that
> is not a reasonable approach.
>