[BioC] Normalising divergent samples.

Thu Sep 16 18:48:11 CEST 2004

Robert, Wolfgang,

Thanks for the response, this is exactly what I was looking for. A lot
to discuss so I'll add some comment/questions.

First minor points - I realise (GC)RMA are exprs estimates, and the
discussion applies to all normalisations. I was just referring to them
as a common/convienient reference.

My comment on people being unaware or ignoring it applied to the 90% of
users who use something like MAS or RMA and look for changes without
exploring the data more, and not aimed at BioC users or the wider bioinf
community. Guessing there is work being done my posts were partly an
attempt to learn more (publicly or privately) about what is in the
pipeline.

The paper by Holstege is interesting but their approach has one obvious
(but unmentioned?) drawback. They quantify based on total RNA, and the
basic gist seems to be - if you have a say 2-fold increase in mRNA then
normalising on the gene level will under-estimate the number of
increases. With (total RNA) spike-ins you get a different (true?)
picture. I'm not the most knowledgeable in biochem but with mRNA c.2%
total RNA, then a 2% change in rRNA and tRNA levels seems more likely
than a 100% change in mRNA. Why not just measure mRNA/total RNA in the
original sample (I assume there is a method?), you could even do this
afterwards.

This obviously leads onto the point of what you want to standardise to,
total RNA is discussed above and has the problem that rRNA/tRNA could
vary with growth rate or treatment (but maybe I'm wrong). Per cell is
obviously good (also used by Holstege) if possible but I guess that's
fairly impracticable for those of us working with tissue, and
calculations based on weight have problems with changes in cell
volume/water content. And there are cheaper ways to measure water
content than with a microarray ;-) I suppose at present northerns have
used total RNA and RT and Q-PCR have standardised to house-keeping
genes.

Perhaps I'm being naive but could standardising to mRNA level be as
valid as the alternatives (assuming you can't use per cell), it
obviously has the limitations we're discussing but doesn't have (the
presumed) troubles discussed above. Obviously if you get a large number
of changes biased in one direction then there would be a corresponding
group of changes in the opposite direction. Anyway I think I might be
getting lost in my thoughts there.. One final clarification - on affy
chips followed by 'standard' normalisation you have exprs values on a
(cRNA / mRNA) proportional scale??

Right now on the points about big differences. I have multiple genotypes
x 3 reps with treated and control giving c.25 treated versus 25
untreated. I've looked at the p values from a t-test and limma
(ebayes(fit)) as you suggested and they are fairly flatly distributed,
except the very lowest ones, and completely uniform from around 0.4-0.5
to 1. >0.5 I have c.5000, which would mean I have c. 13000 changes
(c.55%, or the amount of genes thought to be usually expressed!). How
accurate is this estimate (I'm not a statistician), my previous estimate
was based on controlling the false positives, but I guess I should also
worry about the false negatives (how do you get an accurate measure of
these?).

If I look at the p values for a 3 rep treat versus untreated they are
only uniform from >0.75 and I get c.6500 = 10000 changes. This is lower
than with more replica (I assume as you have less statistical power). My
use of the Pearson correlation was an attempt to get over the difference
in replica as many other studies only have 1-3 and so I assumed the
treat vs. untreat Pearson correlation would indicate the approx number
and magnitude of changes and is not influence by the number of reps? If
it does then many studies have smaller Pearson's than my 'treatment
effect'.

I guess my point about us seeing similar numbers of genes going up and
down is flawed by the fact this is after normalisation. After quantile
normalisation would you always see approx equal up and down?

Anyway discussion on spike-ins is really a side issue for me as this
isn't established (?) with affy chips and we already have the data. In
our case we are more interested in interpreting the differences between
the genotypes. However, my central interest is that looking at the genes
that change in all genotypes we see that these change more (increased
magnitude of up AND down) in some genotypes. This makes biological sense
BUT how can we be sure it's not an artefact of normalisation??? 

Finally I guess the scaling is a simple use of apply, but I know how to
apply functions across columns but what function would mean or median
scale? Do you just 'shift' the data so the means are equal or do you
scale(?) to take into account range/deviation?

Sorry for the long reply, it's just I've been waiting to discuss this
for a while. Hopefully the length won't stop further comment from
others.

Cheers,
Matt

On Wed, Sep 15, 2004 at 10:20:42AM +0200, Matthew  Hannah wrote:

>> I've tried to get a discussion on this several times but have got
very
>> few responses.
>>
>> I'm looking at some data where the treatment has a very BIG effect,
but
>> I don't think this is unusual it's just that a lot of people don't
>> realise it or ignore it.

Hi Matthew,

Robert Gentleman and myself just talked about the issue that you brought
up and here are some of our points:

If by big effect you mean that a lot of genes change expression under
different conditions, then I think that many people are aware of the
issue, but that it has not been widely discussed. Frank Holstege and his
group in Utrecht have worked quite systematically on this, and I would
recommend looking at one of their recent papers:

Monitoring global messenger RNA changes in externally controlled
microarray experiments. van de Peppel J, Kemmeren P, van Bakel H,
Radonjic
M, van Leenen D, Holstege FC. EMBO Rep. 2003 Apr;4(4):387-93.  PMID:
12671682

You seemed to indicate that having a large fraction of genes changing
being only a problem for RMA and GCRMA, but our understanding is that it
is a problem for all "intrinsic" methods of normalizing, i.e. those do
not
use external spike-in controls. We believe that the problem is largely
to
do with normalization, and not with computing expression estimates.

As long as microarray experiments are not trying to measure absolute
molecule abundances, but rather just "relative expression", then we
think
that the problem of interpreting situations in which most genes change
will always remain hard.

On the other hand side, if you use an "extrinsic" method, you need to
decide whether you want to measure number of molecules per cell, or per
total RNA, or per what ... so that's a conceptual issue that needs to be
worked out.

This is also known as a research *opportunity*.

>> If we take the average Pearson correlation of treated versus
untreated
>> as a crude indication of the number of changes (is this valid?) then
in
>> our experiments this is 0.96. Comparing 25 treated versus 25
untreated
>> replicates (GCRMA, LIMMA gene-wise fdr corrected p<0.001) we get
c.30%
>> of transcripts on the chip changing!

You are too imprecise in your description for us to comment on whether
the
method is valid, but why don't you use good old-fashioned statistics:
for
each gene, calculate a p-value from e.g. t-test or an appropriate linear
model generalization, and look at the histogram of p-values. The
empirical
p-values at the right end of the histogram should be approximately
uniformly distributed, and the number of non-differentially expressed
genes can be estimated by 2 times the genes with p>0.5.

>> Looking at a couple of public datasets I don't think our treatment
>> effect (as indicated by the Pearson) is that unusual, it's just that
we
>> have the statistical power to detect the changes. Also looking at the
>> changes, and considering the biology it seems reasonable to get these
>> changes.

It depends a lot on what the factors are. Robert has some collaborators
who use treatments that greatly change things, and others that use
treatments that are so specific that the number of changes genes is
under
10. No global statement that can be made here. For the former, they need
to be warned that their experiment lies outside of the currently
available
technology, and then you do the best you can. With the latter, we should
be able to do a good job, although one may never find the signal if
p-value corrections are applied in a naive fashion (but that is a
different story).

>> In the discussions on RMA/GCRMA there are 2 assumptions discussed
>> 1)few genes changing - obviously not
>> 2)equal # up and down - despite the huge amount of changes there are
>> only 20 more transcripts going up compared to down - so yes.

As I said above, I do not believe this is specific to RMA or GCRMA, but
rather a general problem for all normalization methods.

Also, these are sufficient conditions, but not necessary. I.e. if they
are
fulfilled, (GC)RMA can be guaranteed to work, but if they aren't, the
results from (GC)RMA, or other normalization methods, may still be valid
to a sufficient degree!

>> I've also looked at a number of control genes and can't find any real
>> bias, in fact there is quite a bit of (random?) variation, so if you
>> normalised on a few of these then you may get strange results...

 I think the problem with using control genes is that there are
 typically few of them and they do not necessarily span the range of
 intensities so they provide a poor basis for normalization. Although,
 in the present case they may be better than the alternatives.

>> Finally, I will at some point try separate GCRMAs and then scaling.
If
>> anyone has any scripts for mean, robust mean or median scaling a
series
>> of separate exprs sets then I'd appreciate it.

I hope that this is a rather simple use of lapply or similar
(depending on how you have your exprSets stored).

  Regards,
    Robert, Wolfgang