[R] Analysis of a highly pseudoreplicate mixed-effects experiment
Matthias Gralle
matthias_gralle at eva.mpg.de
Mon Sep 14 13:43:48 CEST 2009
Hello everybody,
I have been trying for some weeks to state the correct design of my
experiment as a GLM formula, and have not been able to find something
appropriate in Pinheiro & Bates, so I am posting it here and hope
somebody can help me.
In each experimental condition, described by
1) gene (10 levels, fixed, because of high interest to me)
2) species (2 levels, fixed, because of high interest)
3) day (2 levels, random)
4) replicate (2 levels per day, random),
I have several thousand data points consisting of two variables:
5) FITC (level of transfection of a cell)
6) APC (antibody binding to the cell)
Because of intrinsic and uncontrollable cell-to-cell variation, FITC
varies quite uniformly over a wide range, and APC correlates rather well
with FITC. In some cases, I pasted day and replicate together as day_repl.
My question is the following:
Is there any gene (in my set of 10 genes) where the species makes a
difference in the relation between FITC and APC ? If yes, in what gene
does species have an effect ? And what is the effect of the species
difference ?
My attempts are the following:
1. Fit the data points of each experimental condition to a linear
equation APC=Intercept+Slope*FITC and analyse the slopes :
lm(Slope~species*gene*day_repl)
This analysis shows clear differences between the genes, but no effect
of species and no interaction gene:species.
The linear fit to the cells is reasonably good, but of course does not
represent the data set completely, so I wanted to incorporate the
complete data set.
2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl))
This gives extremely significant values for any interaction and variable
because there are >200 000 df. Of course, it cannot be true, because the
cells are not really independent. I have done many variations of the
above, e.g.
2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)),
but they all suffer from the excess of df.
3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning
messages like this one:
In repl:day :
numerical expression has 275591 elements: only the first used
4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran
several days, but failed to converge...
Can somebody give me any hint, or do you think the only possible
analysis is a simplification as in my model 1 ?
By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 on a
linux 2.6.24-24-generic kernel on different Intel systems. I am using
the lme4 that came with R 2.8.0.
Thank you very much for your time!
-- Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555
More information about the R-help
mailing list