[BioC] dye swaps of technical replicates and variable
numbers of replicate spots
Gordon Smyth
smyth at wehi.edu.au
Wed Aug 20 11:42:30 MEST 2003
At 02:23 AM 20/08/2003, Ramon Diaz-Uriarte wrote:
>Dear all,
>
>I am analyzing some cDNA data; in the simplest case there are a total of 6
>arrays, with three biological replicates; for each biological replicate, the
>arrays are duplicated and arrayed using dye-swap. Of course, for some genes
>there might be missing values in some of the replicates.
>In addition, some genes are replicated within arrays 5 times, whereas other
>genes are replicated twice (or three times, or four times, or six times), and
>yet others are not replicated at all.
>
>These are the two questions:
>
>1. The limma package includes facilities for handling replicate spots within
>arrays. However, from the help pages and the Bob mutant data example in the
>limma manual, it seems to me that it expects a fairly regular structure.
Yes, that's correct. The regular structure is important. Firstly because
limma handles within-array replicate spots by estimating the spatial
correlation between the replicates, and is it only reasonable to assume
that this correlation is shared by all genes if the replicate structure is
entirely regular. Secondly because subsequent inference methods for
assessing differential expression assume that all genes have been treated
the same and can be treated as having exchangeable standard deviation
estimators. (I understand that this might not be entirely clear - I am
writing up the methodology now as a technical report and the manuscript
will explain the methodology and assumptions more thoroughly.)
So limma is designed to handle within-array replicates arising from robotic
replication in which multiple spots are printed by making multiple dips of
the array printer heads into the same wells on the 384-well plates of DNA.
It is not designed to handle replicate arising from redundancy in the DNA
library unless this is completely regular.
>I understand that my two options are:
>a) take the easy way out, and compute a mean or a median of the replicates;
>b) "adapt" dupcor.series to my situation to get an estimate of the
>correlation
>of replicates, and then "adapt" gls.series (or call gls directly);
>
>Is there any other option?
I would not recommend either of the above, at least in conjunction with
limma. If you take means or medians of spots, and the number of spots being
averaged differs between genes, then this will invalidate the assumption
used by ebayes that all residual standard deviations are exchangeable
(because different genes will be estimated with different precisions). Also
you can't adapt dupcor.series because dupcor.series is designed to
estimated a common spatial correlation, and different genes will have
different between-replicate correlations if they are irregularly spaced.
It might not be ideal, but I would avoid averaging the within-array
replicates and just treat all spots as corresponding to different genes.
Then you can be very confident that you have a reliable result if the same
gene comes up differentially expressed several times (from different
locations on the array).
>2. The dye-swap set up resembles the swirl example in the limma manual, but
>here the dye swaps are of technical replicates. The first idea that came to
>my mind is to fit (e.g., using the nlme package) a random effects model like:
>
>lme(log.ratio ~ the.interesting.effect, random = ~1|the.biological.replicate)
>
>but since I am only interested in the interesting effect (not the replicate
>variation) I think I can get what I want with limma doing:
>
> > design
> Efect R1 R2 R3
>1 0 1 0 0
>2 1 1 0 0
>3 0 0 1 0
>4 1 0 1 0
>5 0 0 0 1
>6 1 0 0 1
> > lm.series(data, design)
>
>Does this make sense?
Yes, the design matrix that you propose should work in limma and will give
you valid results. The random-effects lme approach that you mention above
though is in principle even better. You could get the best possible results
by taking output from lme and inputing it in the right way into ebayes.
(This is the obvious way to handle technical replicates, but I haven't seen
anyone do it yet.)
Best wishes
Gordon
> Does it make sense given the mess with the variable
>number of replicates within arrays (question 1)?
>
>
>Thanks,
>
>Ramón
>
>--
>Ramón Díaz-Uriarte
>Bioinformatics Unit
>Centro Nacional de Investigaciones Oncológicas (CNIO)
>(Spanish National Cancer Center)
>Melchor Fernández Almagro, 3
>28029 Madrid (Spain)
>Fax: +-34-91-224-6972
>Phone: +-34-91-224-6900
>
>http://bioinfo.cnio.es/~rdiaz
More information about the Bioconductor
mailing list