[BioC] Defining and handling replicates

Tue Jun 7 07:26:33 CEST 2005

Thanks for the further explanation.

At 12:20 PM 7/06/2005, Jonathan Arthur wrote:
>To swap the order of my original two questions:
>
>Gordon K Smyth wrote:
>
>>Does "drawn at a later time point" mean that the RNA was extracted from 
>>the same organisim but at a later time?
>
>The source of mRNA for the microarrays are plates of bacteria cultured 
>from  clinical samples provided by (human) subjects.
>
>In most cases, one patient => one sample => one culture => one RNA 
>extraction => one microarray. I assume each microarray is a biological 
>replicate grouped by the clinical status of the patient (disease vs control).
>
>In one case, however, one patient => one sample => one culture => one RNA 
>extraction => *two* microarrays. The two arrays were performed several 
>months apart but come from the same RNA extraction (frozen during the 
>interim). I assume these are technical replicates.

Yes.

>In another case, one patient => one sample => *two* cultures made several 
>months apart (sample frozen in interim) => two extractions => two 
>microarrays. Is this a biological or technical replicate? The fact it is 
>from the same patient/sample suggests a technical replicate, but the 
>different culture suggests a biological replicate??

Technical replication refers to any replication which fails to repeat all 
the relevant steps, so this is technical replication. However, as you've 
explained clearly yourself, in any multistage process there are many 
possible levels of technical replication. In your previous example, the 
variation between the technical replicates would reflect only the 
microarray component of variation. In this case, the variation between 
technical replicates reflects variation between cultures and extractions as 
well as the variation between microarrays.

>>Have you read the sections on technical replication in the Limma User's 
>>Guide?
>>That would be the place to start.
>
>Yes, however I am having difficultly rationalising the section on "Two 
>Groups: Affymetrix" with the two on "Technical Replication"
>
>If I treat everything as biological replicates, using a group-means
>parameterization, the design I use is:
>
>>design <- 
>>cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0,0,1,1,1,1,1,1,1))
>
>Presumably, I need to do something like:
>
>corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???))
>fit <- lmFit(eset, design, block=c(???), correlation=corfit$consensus)
>
>checking first to make sure corfit$consensus is positive.
>
>But I am not clear on how to define the block vector?

For an experiment which systematically uses both biological and technical 
replication, you would set block=Patient. In your experiment however you 
don't have enough technical replication to reliably decompose variability 
into biological and technical components, and the technical replication is 
inconsistent anyway.

One approach, which you already have mentioned, is to average over your 
technical replicates. This will however invalidate any rigorous statistical 
analysis, because the averages will be less variable than the individual 
arrays, by an amount which is unknown, because you don't know how much 
technical variation you are averaging over.

The simplest approach for you would be to simply choose what you think are 
the best arrays for the two patients for whom you have replicates, and 
discard the two superfluous arrays.

Alternatively, there is a trick which would allow you to use all your 
arrays. But it requires a feature of the lmFit() function which I don't 
wish to publicly document yet, as it would be easy to mis-use, so I will 
write to you offline.

Gordon

>Thanks for your help.
>
>Jonathan