[R] Question of the impact of the pilot experiment on the overal statistic interpretation of the subsequent work

Tue Apr 17 21:56:02 CEST 2007

Hi,

I have a question regarding the impact of the pilot experiment on the
overall statistic interpretation of the subsequent work.

The context is as following:

In a lab there are one Professor and three graduate students A, B, and
C.  They are working on analysis of some disease to discover genes
differentiate + and - categories like disease or non-disease.  Number of
samples of (+) is about m while that of the samples of (-) is n.  Both
m, n are sufficiently large, e.g. bigger than 100.  

Pilot experiment:
In order to save efforts and resources, the Professor decided to pool
the samples in each category with equal amount such that he got only two
pooled samples of (+) and (-).  His argument is that if there is no
difference in the pooled samples then he would decide to abandon the
project.  Graduate student A did a microarray analysis of  the pooled
(+) and (-) and found gene X, Y, Z have fold of change bigger than
100.  

Professor thought this was interesting and encouraging based upon his
biological insight of gene X, Y and Z and the potential disease link of
these genes.  
(1) He asked graduate student B to do a protein analysis, using a
different technique (western blot), of all the original samples (m, n)
and found gene Y is truly differential.  Based upon the protein analysis
data, graduate student A calculated P value using t test to describe the
statistic significance of gene Y differentiating (+) and (-) categories.
(2) Simultaneously, he also asked graduate student C to do a full scale
microarray experiment using all m, n samples individually.  It is a very
laborious work but graduate student C finished everything and using some
off the shelf microarray statistical packages, he calculated and found
gene Y, Z and another un-identified gene W to be statistically
significant.  He calculated the false discovery rate and P value of
these genes differentiating (+) and (-) categories.

The professor presented his students A, B, C's work including the
calculated statistics in a conference.  In the audience, statistician D
commented that professor has made a mistake here: because he is using
the SAME samples, whether pooled or individual, in both the pilot and
subsequent experiments, statistically the professor is "cheating" and
his students' calculated statistics are no longer valid.

Can statisticians in this mailing list comment on this story? One thing
I want to emphasize here is that nobody disputes that it is highly
critical to use a different set of samples to validate the discoveries.
The question here is that contingent on the pilot experiment of the
pooled samples, whether the subsequent full scale experiments using the
SAME samples can yield meaningful statistics to describe the differences
of the discovered features.

Thanks.