[BioC] When to treat technical reps as biological reps? WAS:Re: 2x2 factorial loop without common reference (pool)

Gordon K Smyth smyth at wehi.EDU.AU
Mon May 1 02:16:19 CEST 2006


Dear Jenny,

The second issue you identify (the DF issue) isn't an intrinsic characteristic of multi-level
variation.  Rather the whole issue of DF is model-dependent.

If there really is no biological-replicate effect (the variance component is relatively small, so
the correlation is small), and you fit a model without it, then the whole issue with DF doesn't
arise.  In this case there is only one error level and you are perfectly justified in using all
the available DF to estimate it.

BTW, in the limma approach to multilevel models, the full DF is always used.  The DF issues
associated with ANOVA models is side-stepped.  This is a consequence of the extreme smoothing
across genes using by the approach.  A full explanation of this would need a lot of space ...

Best wishes
Gordon

On Thu, April 27, 2006 2:00 am, Jenny Drnevich wrote:
> Hi everyone,
>
> Comments from Naomi and Gordon (below) about the technical replication in
> the 2x2 factorial loop experiment are very close to an issue I have been
> struggling with for several analyses: When (if ever) is it OK to treat
> technical replicates as biological replicates? Often this is done when
> there is more than one random effect (e.g. also have duplicate spots,
> blocking effects, etc.) because as Gordon has said previously, the between
> gene smoothing of limma cannot currently be done with more than one random
> effect. I know there have been many discussions on this on the list
> previously, but I can see two problems with treating tech reps as
> biological reps, and only one of them has been addressed:
>
> 1. There is likely to be artificially decreased variance within treatment
> groups because tech reps should have higher correlations than biological
> reps. This problem has been addressed several times and probably the best
> answer has come from Gordon along the lines of: often measurement error is
> larger than biological variation, so IF there are not higher correlations
> among tech reps then variance estimates should not be artificially decreased.
>
> 2. The DF is artificially increased due to psuedoreplication of the
> biological replicates, which leads to artificially lower p-values. This
> combined with even minor changes to the variance components can lead to
> large changes in p-values in my experience.
>
> As far as I know, this second problem has not been addressed. As a case in
> point, in the 2x2 factorial loop from before, each of the three biological
> replicates has 4 technical replicates, and even if there are not higher
> correlations, treating them as biological reps yields N=12 for each group
> instead of N=3. Shouldn't we be worried about this effect as well? In such
> cases when the experiment design really has more than one random effect,
> wouldn't the analysis be better off to model the random effects properly
> with a multilevel model such as lme/nlme rather than get the benefits of
> the empirical Bayes shrinkage either through ignoring technical replication
> or averaging dye swaps?
>
> Thanks,
> Jenny
>
> Naomi's comment:
> I would use single channel analysis for
> this.  The only problem is that Limma allows only
> 1 level of random effects.  Hence, you will need to average the dye-swaps.
>
> Gordon's comment:
>>PS. Although you don't say explicitly, I'm assuming that a1, a2 etc
>>represent some sort of biological replication. The above analysis
>>does not keep track of which array has which biological replicate of
>>each treatment. If you wanted to do a careful job of that, you would
>>have no choice but to do a "separate channel" analysis, as Naomi
>>Altman has suggested separately. If your biological replicates a1, a2
>>etc are not very different, compared to microarray measurement error,
>>then the above simpler analysis may be good enough.



More information about the Bioconductor mailing list