[BioC] Pooling in microarray studies

Mon Oct 6 11:11:49 MEST 2003

Hi Wiesner,

Pooling, as you have noted, is often done when sufficient
mRNA is not available. I see that others have indicated that
you probably have enough mRNA...but I will make a few
comments anyway. Pooling is also done in an effort to
reduce the effect of biological variability. There are
many studies that have used pooled samples for this
latter purpose. Provided certain assumptions hold
(mRNAs average out when pooled and there are no outliers),
pooling can be advantageous. You end up getting an estimate
of the sum of pool to pool and technical variability....
and provided you are making inferences about pools of subjects
(as opposed to individuals - which is often the case
when studying experimental populations), this estimate is
the one you need. I discuss this in a recent Biostatistics paper.
I'll send it under separate cover.

We are doing studies to check these assumptions using ~60
Affy chips. There is some evidence to think that the assumptions
might NOT hold.

Finally, I will note that if there is contamination
(let's say an outlier animal), pooling might still be useful.
As I argued at a talk at JSM (slides are
at my website), there are different cases of contamination to
consider. THe most realistic one is that an animal is an outlier
in say some collection of genes....and you don't know of course what
that collection is. You'll either end up averaging at the
mRNA level (again, provided mRNAs average) or you will end up
averaging the individual measurements across arrays (after normalizing).

I hope that helps.

Christina

Christina Kendziorski
Assistant Professor
Department of Biostatistics and Medical Informatics
University of Wisconsin - Madison
Medical Sciences Center (6729)
1300 University Avenue
Madison, Wisconsin 53706

Phone: (608) 262-3146
Fax: (608) 265-7916

> I have question arising to the pooling of mRNA
> samples. Someone approached me about the
> following problem:
>
> The study wants to use Affymetrix chips to study
> changes in expression between a group of treated
> mice and a group untreated mice. There are 10 mice
> in each group. It is only possible to extract
> 8 ug of RNA from each mouse, not enough for one chip.
> (According to the experimenters they require 10 ug per
> chip)  So it is not possible to use biological
> replicate chips for each individual mice. Now the issue
> is whether to perhaps pool the RNA in each group
> and carry out analysis on technical replicates from the
> pooled samples.
>
> As I understand it pooling may reduce the precision, with
> the risk that one or few samples can dominate the outcome, and
> that averaging over single sample hybridisations is perhaps
> safer than using pooled samples. However in this case you cannot
> do single sample hybridisations.
>
> I was wondering if the following approach is an acceptable
> compromise to retain at least some information on the between
> sample variation in each group:
>
> Mix the RNA from 2 different mice on a single chip to get 5
> hybridisations, where the hybridisation on each chip is from the
> mix of the RNA samples of two mice? I though that this may
> enable you to some extend if all the mice are behaving
> similarly. Ofcourse one would not be able to distinguish
> between the behaviour of the two mice relating to the same
> chip. Or is it better to accept that you do not have enough
> RNA to hybridize the sample for each individual to a separate
> chip and pool the samples and accept the risk that
> one sample may dominate the outcome? The best
> solution did not seem obvious (to me at least!)
>
> Any comments will be much appreciated.
>
> Wiesner
>
>
>
> Wiesner J. Vos
> Department of Statistics
> University of Oxford
> 1 South Parks Road
> OX1 3TG
> United Kingdom