[R-sig-ME] Problems with formula for highly pseudoreplicatemixed-effects system

Thu Sep 17 18:53:02 CEST 2009

Thank you, David!

I have tried out a variant of your formula (pasting first gene and 
species as gene_species):

 > duffymodel=lmer(APC ~ gene*species + FITC + (FITC|gene) + 
(FITC|gene_species) + (FITC|species) + (1|day) + (1|day_repl),data=small)
 > duffymodelred=lmer(APC ~ gene+species + FITC + (FITC|gene) + 
(FITC|species) + (1|day) + (1|day_repl),data=small)
 > anova(duffymodel,duffymodelred)
Data: small
Models:
duffymodelred: APC ~ gene + species + FITC + (FITC | gene) + (FITC | 
species) +
duffymodelred: (1 | day) + (1 | day_repl)
duffymodel: APC ~ gene * species + FITC + (FITC | gene) + (FITC | 
gene_species) +
duffymodel: (FITC | species) + (1 | day) + (1 | day_repl)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
duffymodelred 13 8625.3 8686.2 -4299.7
duffymodel 17 8562.3 8641.9 -4264.1 71.068 4 1.350e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

At first sight, this would seem to mean that gene:species is indeed a 
highly significant interaction. Now, what I meant by pseudoreplication 
is that the several hundred (or in my full data set, several thousand) 
cells in each experimental condition are not independent. They were all 
treated in exactly the same way, but due to intrinsic variability they 
express widely varying amounts of protein. I see in my data that 
replicate experiments sometimes differ strongly in their slope APC~FITC, 
even though each replicate contains several thousand data points. My 
results seem to be much more reliable when I repeat the experiment 
several times with independent cell treatment than when I increase the 
number of cells measured in one experimental condition. That is why I 
don't know if I can trust "your" formula.

Put another way, when I delete either (1|day) or (1|day_repl) from the 
formula, the difference is also highly significant, but I don't trust this.

hatTrace would maybe give a clearer indication of what I mean: there are 
too many residual degrees of freedom.

I'm sorry I don't know anything about testing for homogeneity of the 
partial correlation coefficients. I can try to look it up, or can you 
give me a hint ?

Hope this makes sense.

Matthias

David Duffy wrote:
> On Thu, 17 Sep 2009, Matthias Gralle wrote:
>
>>
>> I have been trying for some weeks to state the correct design of my 
>> experiment as a GLM formula, and have not been able to find something 
>> appropriate in Pinheiro & Bates 2000 or with any of the local R 
>> users, so I am posting it here and hope somebody can help me.
>>
>> In each experimental condition, described by
>> 1) gene (10 levels, fixed, because of high interest to me)
>> 2) species (2 levels, fixed, because of high interest)
>> 3) day (2 levels, random)
>> 4) replicate (2 levels per day, random),
>>
>> I have several thousand data points consisting of two variables:
>>
>> 5) FITC (level of transfection of a cell)
>> 6) APC (antibody binding to the cell)
>>
>> ...pseudoreplication, and with 200000 data points in the original 
>> data set, any interaction will be
> What do you mean by pseudoreplication -- repeated measures? Don't you 
> want something like APC ~ gene + FITC + (FITC|gene) + species + 
> (1|day) + (1|replicate), where your interest is in the random 
> regression FITC|gene?. Alternatively/equivalently, how about testing 
> for homogeneity of the FITC-APC (partial) correlation coefficients 
> across 10 gene strata (what do these look like?). The latter is 
> natural for a multigroup SEM.
>
>
> I hope I understand what you are talking about
>
> Cheers, David Duffy.
>
>
>
>
>

-- 
Matthias Gralle, PhD
Dept. Evolutionary Genetics
Max Planck Institute for Evolutionary Anthropology
Deutscher Platz 6
04103 Leipzig, Germany
Tel +49 341 3550 519
Fax +49 341 3550 555