[BioC] Harsh results using limma!

Mon Aug 16 10:31:41 CEST 2004

Hi Guys

Well this turned into a very interesting discussion, thank you for your
inputs.  All of the explanations lead to a single conclusion, and that
is that I (we?) need to find significant differences which are present
in only subsets of the data.  

Let me explain - here I had samples from three animals.  Two animals
showed what looks like highly-repeatable differential expression, and
the third did not.  If we make the assumption that this is down to
biological variation (ie two of my animals showed an immune response,
the third did not, simply because they are different animals), then
standard statistical tests are missing an effect which is present in two
thirds of my population.  If you ask me "are you interested in finding
effects which are present in only two thirds of your population?" then
the answer is of course I am!  

Over the last 5 years the whole issue of pharmacogenomics became huge,
the right drug for the right patient etc, and I know I am speculating
wildly here, but perhaps what my data is showing me is exactly that -
that two-thirds of my population show a particular immune response but
the other third does not.  And that's very interesting ;-)

Now, to the non-statistician, the "bull in a china shop" approach to
solving this would appear to be to take all possible subsets of my data
and running limma on them, to find significant changes in subsets of my
data.  Clearly this becomes problematic for large datasets.  Presumably
there are many more intelligent ways....?

Thanks again

Mick

-----Original Message-----
From: Gordon K Smyth [mailto:smyth at wehi.edu.au] 
Sent: 14 August 2004 01:07
To: David K Pritchard
Cc: Anthony Rossini; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Harsh results using limma!

> I think Mick's experiences point out a fundamental problem with 
> current statistical analysis of microarray data.  If his data was .2, 
> .2, .2,  (dye flips) -.2, -.2, -.2 then Limma would note this gene as
highly differentially expressed.  In contrast when he sees 6.29, 5.54,
0.2, (dye
> flips)-5.27,-4.61,   -0.2 Limma did not mark it as differentially
expressed.

Actually it is not true that limma will necessarily rank the first gene
higher than the second. 
Obviously t-tests would do so, but limma may well rank the second gene
higher depending on the information about variability inferred from the
whole data set.  Looking at fold change alone ranks the second gene
higher while t-tests would rank the first higher.  Limma is somewhere in
between depending on the dataset.  A typical microarray dataset actually
would lead to the second gene being ranked higher, i.e., would lead to
the ranking that you would prefer.

>      As a biologist I would argue the case for the genes actually 
> being differentially expressed is much higher in the second case.  Yet

> using modified T-statistic approaches and with the limited number of 
> repeats common with current array experiments,  I see array
experiments "missing" these very interesting high variance genes all the
time.
>     Current analytical techniques put a high premium on consistency of

> results and a lower premium on strength of differential expression 
> which is the parameter that biologists would argue is the most
significant.
>      There are a variety of biological reasons why high variance genes

> should exist and personally I think these genes are likely to be the 
> biologically interesting ones that we should be looking for on
microarrays.
>      I understand why Limma does what it is does and it is a 
> fantastically useful program. However, I would suggest to the 
> statisticians reading this message  that it would be very useful to 
> start developing analytical techniques which could better detect high 
> variance genes.

I agree with the overall point.  Two strategies currently available are:
1. Use spot quality weights.  In the example given above it appears that
two of the arrays or spots have failed to register any worthwhile fold
change for a gene which is differentially expressed on the other arrays.
If this can be identified as being due to low quality spots or arrays,
then the values may be down-weighted in an analysis and the gene will
revert to being highly significant. 2. If small fold changes are not of
biological interest to you, then you can require a minimum magnitude for
the fold change as well as looking for evidence of differential
expression.

Gordon

> David Pritchard

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor