[BioC] Defining and handling replicates
Gordon Smyth
smyth at wehi.edu.au
Tue Jun 7 07:26:33 CEST 2005
Thanks for the further explanation.
At 12:20 PM 7/06/2005, Jonathan Arthur wrote:
>To swap the order of my original two questions:
>
>Gordon K Smyth wrote:
>
>>Does "drawn at a later time point" mean that the RNA was extracted from
>>the same organisim but at a later time?
>
>The source of mRNA for the microarrays are plates of bacteria cultured
>from clinical samples provided by (human) subjects.
>
>In most cases, one patient => one sample => one culture => one RNA
>extraction => one microarray. I assume each microarray is a biological
>replicate grouped by the clinical status of the patient (disease vs control).
>
>In one case, however, one patient => one sample => one culture => one RNA
>extraction => *two* microarrays. The two arrays were performed several
>months apart but come from the same RNA extraction (frozen during the
>interim). I assume these are technical replicates.
Yes.
>In another case, one patient => one sample => *two* cultures made several
>months apart (sample frozen in interim) => two extractions => two
>microarrays. Is this a biological or technical replicate? The fact it is
>from the same patient/sample suggests a technical replicate, but the
>different culture suggests a biological replicate??
Technical replication refers to any replication which fails to repeat all
the relevant steps, so this is technical replication. However, as you've
explained clearly yourself, in any multistage process there are many
possible levels of technical replication. In your previous example, the
variation between the technical replicates would reflect only the
microarray component of variation. In this case, the variation between
technical replicates reflects variation between cultures and extractions as
well as the variation between microarrays.
>>Have you read the sections on technical replication in the Limma User's
>>Guide?
>>That would be the place to start.
>
>Yes, however I am having difficultly rationalising the section on "Two
>Groups: Affymetrix" with the two on "Technical Replication"
>
>If I treat everything as biological replicates, using a group-means
>parameterization, the design I use is:
>
>>design <-
>>cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0,0,1,1,1,1,1,1,1))
>
>Presumably, I need to do something like:
>
>corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???))
>fit <- lmFit(eset, design, block=c(???), correlation=corfit$consensus)
>
>checking first to make sure corfit$consensus is positive.
>
>But I am not clear on how to define the block vector?
For an experiment which systematically uses both biological and technical
replication, you would set block=Patient. In your experiment however you
don't have enough technical replication to reliably decompose variability
into biological and technical components, and the technical replication is
inconsistent anyway.
One approach, which you already have mentioned, is to average over your
technical replicates. This will however invalidate any rigorous statistical
analysis, because the averages will be less variable than the individual
arrays, by an amount which is unknown, because you don't know how much
technical variation you are averaging over.
The simplest approach for you would be to simply choose what you think are
the best arrays for the two patients for whom you have replicates, and
discard the two superfluous arrays.
Alternatively, there is a trick which would allow you to use all your
arrays. But it requires a feature of the lmFit() function which I don't
wish to publicly document yet, as it would be easy to mis-use, so I will
write to you offline.
Gordon
>Thanks for your help.
>
>Jonathan
More information about the Bioconductor
mailing list