[R-sig-ME] need help with mixed effects model
Mark W Kimpel
mwkimpel at gmail.com
Mon Feb 25 07:17:18 CET 2008
What I have been persuing over the past 2 days is this model, which is
different from what I posted to BioC and similar to yours except it
includes the second gene:
mod <- gene2.expression ~ gene1.expression + Strain + (1|Rat))
I understand your concern and I rely on your expertise as to whether
this above model is tenable, but let me clarify and give background to
the question I am exploring.
It is very common in gene expression analysis to perform hierarchical
clustering to see which genes are most closely correlated in regards to
expression over a number of samples. The problem with clustering is that
all genes will cluster somewhere, and it if often hard to make sense of
the result. I wish to assign some level of significance to the
correlation between the expression of two genes, admittedly not
independent of one another. If I just had a bunch of rats under no
experimental conditions and took one sample per rat, I would think it
would be reasonable to apply the Pearson Correlation approach and look
at the p value of the slope.
I, however, have not been presented with such a simple experiment and
recognize that I cannot simply correlate all 36 samples as if they were
independent and came from the same population. In fact the genes that
are most interesting are those that are differentially expressed between
the two strains, and from each rat I have samples from each of 3 brain
regions, which I know from prior work are correlated within rat but
different between brain regions. In fact, it would be nice to be able to
take brain region into account rather than looking it as a random effect
within animal, but I am pretty sure I run out of degrees of freedom if I
So, it is necessary for me to have both genes in the model to accomplish
what I need to do, and the other factors are, in fact, are there merely
to improve power to detect gene-gene correlation by accounting for the
variance induced by these other factors.
So, is this legit or is there a better approach? Thanks so much for yoru
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)
Douglas Bates wrote:
> On Fri, Feb 22, 2008 at 11:57 AM, Mark W Kimpel <mwkimpel at gmail.com> wrote:
>> This is my first foray into in mixed models and, while awaiting the
>> arrival of:
>> Extending the Linear Model with R: Generalized Linear, Mixed Effects
>> and Nonparametric Regression Models
>> Mixed Effects Models in S and S-Plus
>> I am in need to some advice.
>> I would like to look at gene-gene correlations within a multi-factorial,
>> mixed effects experiment. Here are the factors, with levels:
>> Gene Expression: 2 different genes per Animal, continuous variable
>> Animals: 6 per Strain
>> Tissues: 3 per animal
>> Strain: 2
>> I thus have 6*3*2 = 36 samples
>> I do not care, for this analysis, about differences between Tissues,
>> Strains, or Animals, in fact, I want to control for them while examining
>> the correlation of expression of the two genes. In other words, I want
>> look at something very much like the Pearson correlation coefficient
>> controlled for these other factors.
>> I guess the first question I should ask is: "is a mixed model the way to
>> go, and, if not, what would be the correct approach?"
> Perhaps. How do you plan to incorporate the two genes?
>> Assuming mixed models will work, as I see it through my newbie eyes,
>> Tissue and strain are fixed effects and animals are random effects.
> If you were interested in just 1 gene than I would say that this looks
> like a good approach. I'm just not sure what to do about the multiple
>> Any suggestions for an approach and a model?
> The model specification (assuming that each animal has a distinct
> number) would be something like
> gene1 ~ Tissue * Strain + (1|Animal)
> In your earlier message to the Bioconductor list you had a
> specification that looked like
> gene1 ~ gene2 + ...
> which makes me a little queasy because you are assuming that gene2 is
> "known" relative to the variability in gene1 and most of the time that
> is not a reasonable approach.
More information about the R-sig-mixed-models