[R] Question of the impact of the pilot experiment on the overal statistic interpretation of the subsequent work

Tue Apr 17 23:16:31 CEST 2007

Bruce,

Far below you ask for comment on an issue of statistical interpretation 
of a collection of biological experiments.

Your university (judging from your email handle) has one of the best 
statistics departments in the world and some of the best biostatisticians 
in the galaxy.

You would do far better (than posting your query here) to take your issue 
up with one of them.

In a face-to-face meeting with one of them you will get a much better 
analysis and discussion of the issues than you could hope for from a list 
like this, notwithstanding that some statisticians sometimes provide 
thoughtful answers to posts asking for statistical help.

Chuck

On Tue, 17 Apr 2007, Bruce Ling wrote:

> Hi,
>
> I have a question regarding the impact of the pilot experiment on the
> overall statistic interpretation of the subsequent work.
>
> The context is as following:
>
> In a lab there are one Professor and three graduate students A, B, and
> C.  They are working on analysis of some disease to discover genes
> differentiate + and - categories like disease or non-disease.  Number of
> samples of (+) is about m while that of the samples of (-) is n.  Both
> m, n are sufficiently large, e.g. bigger than 100.
>
> Pilot experiment:
> In order to save efforts and resources, the Professor decided to pool
> the samples in each category with equal amount such that he got only two
> pooled samples of (+) and (-).  His argument is that if there is no
> difference in the pooled samples then he would decide to abandon the
> project.  Graduate student A did a microarray analysis of  the pooled
> (+) and (-) and found gene X, Y, Z have fold of change bigger than
> 100.
>
> Professor thought this was interesting and encouraging based upon his
> biological insight of gene X, Y and Z and the potential disease link of
> these genes.
> (1) He asked graduate student B to do a protein analysis, using a
> different technique (western blot), of all the original samples (m, n)
> and found gene Y is truly differential.  Based upon the protein analysis
> data, graduate student A calculated P value using t test to describe the
> statistic significance of gene Y differentiating (+) and (-) categories.
> (2) Simultaneously, he also asked graduate student C to do a full scale
> microarray experiment using all m, n samples individually.  It is a very
> laborious work but graduate student C finished everything and using some
> off the shelf microarray statistical packages, he calculated and found
> gene Y, Z and another un-identified gene W to be statistically
> significant.  He calculated the false discovery rate and P value of
> these genes differentiating (+) and (-) categories.
>
> The professor presented his students A, B, C's work including the
> calculated statistics in a conference.  In the audience, statistician D
> commented that professor has made a mistake here: because he is using
> the SAME samples, whether pooled or individual, in both the pilot and
> subsequent experiments, statistically the professor is "cheating" and
> his students' calculated statistics are no longer valid.
>
> Can statisticians in this mailing list comment on this story? One thing
> I want to emphasize here is that nobody disputes that it is highly
> critical to use a different set of samples to validate the discoveries.
> The question here is that contingent on the pilot experiment of the
> pooled samples, whether the subsequent full scale experiments using the
> SAME samples can yield meaningful statistics to describe the differences
> of the discovered features.
>
> Thanks.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901