[BioC] RE: Normalising divergent samples.

Mon Sep 27 18:06:14 CEST 2004

Hi,

Obviously I managed to dampen the discussion with my usual mix of stats
and biological questions and the fact it was few pages long ;-).
Anyway...

Having now read the Holstege paper fully I'll offer my opinion (for what
it's worth). If you can count and standarise to cell # and know that
there are unlikely to be any factors influencing your RNA extraction
efficiency then it seems like a good way to overcome the problem.
However, if you are dealing with tissue samples and have factors that
might influence extraction efficiency (such as carbohydrate content)
then it could be alot of work for very uncertain returns. I'm far from
convinced that standardising to total RNA (if cell # not possible) is
any better than assuming equal mRNA.

In respect of existing affy data, I currently believe the best way to
procede is to take an innocent until proven guilty approach. I'm looking
at the raw data with minimal pre-processing, and unless I can show there
is a systematic bias that could explain the pattern we see then I'll
accept it. The question is how to look for such bias.

I'm bascially using mas5 values (without mm correction) which give a
robust mean of the raw pm values. I've then used either none, mean or
median scaling (Since asking I managed this by modifying the
affy.scalevalue.exprSet function - very efficient).

I think the best available approach is to look at a diverse range of
"house-keeping" genes and see whether they show any bias. Clearly there
is now considerable evidence that HK genes are not always constant, but
looking at many different ones should guard against this. And if it's
good enough to verify your data with Q-PCR ;-)

I was also wondering whether 'absent' genes could be looked at in any
way. I was thinking along the lines that you could assume that the BG
non-specific hybridisation was constant between tissues or treatments
and then look for a ratio? of non-specific:specific in the raw intensity
data. I haven't really thought much about how to implement this (or if
it makes sense to do so!), so any suggestions welcome.

Also once you've selected which genes (several sets obviously better) to
use you need to use some kind of statistical test to identify potential
bias. I think taking the mean between replicates is necessary to reduce
the noise, but then how to compare the multiple mean expression valuesas
there's no longer any replication. With multiple genes and 2 conditions
this could be done using a paired t-test but what about when you have
multiple genes and say 20 different conditions, with only 1 exprs value
per gene per condition? In my mind the genes are acting as the
'replicates' but obviously vary hugely, would a linear model be what I'm
looking for?

Well there's another veritable essay, so I'd better stop for the moment
and hope that someone is still interested in this discussion.

Cheers,
Matt

> -----Original Message-----
> From: Matthew Hannah 
> Sent: Donnerstag, 16. September 2004 18:48
> To: 'bioconductor at stat.math.ethz.ch'
> Cc: 'rgentlem at jimmy.harvard.edu'; 'w.huber at dkfz-heidelberg.de'
> Subject: Normalising divergent samples.
> 
> Robert, Wolfgang,
> 
> Thanks for the response, this is exactly what I was looking 
> for. A lot to discuss so I'll add some comment/questions.
> 
> First minor points - I realise (GC)RMA are exprs estimates, 
> and the discussion applies to all normalisations. I was just 
> referring to them as a common/convienient reference.
> 
> My comment on people being unaware or ignoring it applied to 
> the 90% of users who use something like MAS or RMA and look 
> for changes without exploring the data more, and not aimed at 
> BioC users or the wider bioinf community. Guessing there is 
> work being done my posts were partly an attempt to learn more 
> (publicly or privately) about what is in the pipeline.
> 
> The paper by Holstege is interesting but their approach has 
> one obvious (but unmentioned?) drawback. They quantify based 
> on total RNA, and the basic gist seems to be - if you have a 
> say 2-fold increase in mRNA then normalising on the gene 
> level will under-estimate the number of increases. With 
> (total RNA) spike-ins you get a different (true?) picture. 
> I'm not the most knowledgeable in biochem but with mRNA c.2% 
> total RNA, then a 2% change in rRNA and tRNA levels seems 
> more likely than a 100% change in mRNA. Why not just measure 
> mRNA/total RNA in the original sample (I assume there is a 
> method?), you could even do this afterwards.
> 
> This obviously leads onto the point of what you want to 
> standardise to, total RNA is discussed above and has the 
> problem that rRNA/tRNA could vary with growth rate or 
> treatment (but maybe I'm wrong). Per cell is obviously good 
> (also used by Holstege) if possible but I guess that's fairly 
> impracticable for those of us working with tissue, and 
> calculations based on weight have problems with changes in 
> cell volume/water content. And there are cheaper ways to 
> measure water content than with a microarray ;-) I suppose at 
> present northerns have used total RNA and RT and Q-PCR have 
> standardised to house-keeping genes.
> 
> Perhaps I'm being naive but could standardising to mRNA level 
> be as valid as the alternatives (assuming you can't use per 
> cell), it obviously has the limitations we're discussing but 
> doesn't have (the presumed) troubles discussed above. 
> Obviously if you get a large number of changes biased in one 
> direction then there would be a corresponding group of 
> changes in the opposite direction. Anyway I think I might be 
> getting lost in my thoughts there.. One final clarification - 
> on affy chips followed by 'standard' normalisation you have 
> exprs values on a (cRNA / mRNA) proportional scale??
> 
> Right now on the points about big differences. I have 
> multiple genotypes x 3 reps with treated and control giving 
> c.25 treated versus 25 untreated. I've looked at the p values 
> from a t-test and limma (ebayes(fit)) as you suggested and 
> they are fairly flatly distributed, except the very lowest 
> ones, and completely uniform from around 0.4-0.5 to 1. >0.5 I 
> have c.5000, which would mean I have c. 13000 changes (c.55%, 
> or the amount of genes thought to be usually expressed!). How 
> accurate is this estimate (I'm not a statistician), my 
> previous estimate was based on controlling the false 
> positives, but I guess I should also worry about the false 
> negatives (how do you get an accurate measure of these?).
> 
> If I look at the p values for a 3 rep treat versus untreated 
> they are only uniform from >0.75 and I get c.6500 = 10000 
> changes. This is lower than with more replica (I assume as 
> you have less statistical power). My use of the Pearson 
> correlation was an attempt to get over the difference in 
> replica as many other studies only have 1-3 and so I assumed 
> the treat vs. untreat Pearson correlation would indicate the 
> approx number and magnitude of changes and is not influence 
> by the number of reps? If it does then many studies have 
> smaller Pearson's than my 'treatment effect'.
> 
> I guess my point about us seeing similar numbers of genes 
> going up and down is flawed by the fact this is after 
> normalisation. After quantile normalisation would you always 
> see approx equal up and down?
> 
> Anyway discussion on spike-ins is really a side issue for me 
> as this isn't established (?) with affy chips and we already 
> have the data. In our case we are more interested in 
> interpreting the differences between the genotypes. However, 
> my central interest is that looking at the genes that change 
> in all genotypes we see that these change more (increased 
> magnitude of up AND down) in some genotypes. This makes 
> biological sense BUT how can we be sure it's not an artefact 
> of normalisation??? 
> 
> Finally I guess the scaling is a simple use of apply, but I 
> know how to apply functions across columns but what function 
> would mean or median scale? Do you just 'shift' the data so 
> the means are equal or do you scale(?) to take into account 
> range/deviation?
> 
> Sorry for the long reply, it's just I've been waiting to 
> discuss this for a while. Hopefully the length won't stop 
> further comment from others.
> 
> Cheers,
> Matt
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Sep 15, 2004 at 10:20:42AM +0200, Matthew  Hannah wrote:
> 
> >> I've tried to get a discussion on this several times but have got 
> >> very few responses.
> >>
> >> I'm looking at some data where the treatment has a very 
> BIG effect, 
> >> but I don't think this is unusual it's just that a lot of people 
> >> don't realise it or ignore it.
> 
> Hi Matthew,
> 
> Robert Gentleman and myself just talked about the issue that 
> you brought up and here are some of our points:
> 
> If by big effect you mean that a lot of genes change 
> expression under different conditions, then I think that many 
> people are aware of the issue, but that it has not been 
> widely discussed. Frank Holstege and his group in Utrecht 
> have worked quite systematically on this, and I would 
> recommend looking at one of their recent papers:
> 
> Monitoring global messenger RNA changes in externally 
> controlled microarray experiments. van de Peppel J, Kemmeren 
> P, van Bakel H, Radonjic M, van Leenen D, Holstege FC. EMBO 
> Rep. 2003 Apr;4(4):387-93.  PMID:
> 12671682
> 
> You seemed to indicate that having a large fraction of genes 
> changing being only a problem for RMA and GCRMA, but our 
> understanding is that it is a problem for all "intrinsic" 
> methods of normalizing, i.e. those do not use external 
> spike-in controls. We believe that the problem is largely to 
> do with normalization, and not with computing expression estimates.
> 
> As long as microarray experiments are not trying to measure 
> absolute molecule abundances, but rather just "relative 
> expression", then we think that the problem of interpreting 
> situations in which most genes change will always remain hard.
> 
> On the other hand side, if you use an "extrinsic" method, you 
> need to decide whether you want to measure number of 
> molecules per cell, or per total RNA, or per what ... so 
> that's a conceptual issue that needs to be worked out.
> 
> This is also known as a research *opportunity*.
> 
> >> If we take the average Pearson correlation of treated versus 
> >> untreated as a crude indication of the number of changes (is this 
> >> valid?) then in our experiments this is 0.96. Comparing 25 treated 
> >> versus 25 untreated replicates (GCRMA, LIMMA gene-wise fdr 
> corrected 
> >> p<0.001) we get c.30% of transcripts on the chip changing!
> 
> You are too imprecise in your description for us to comment 
> on whether the method is valid, but why don't you use good 
> old-fashioned statistics: for each gene, calculate a p-value 
> from e.g. t-test or an appropriate linear model 
> generalization, and look at the histogram of p-values. The 
> empirical p-values at the right end of the histogram should 
> be approximately uniformly distributed, and the number of 
> non-differentially expressed genes can be estimated by 2 
> times the genes with p>0.5.
> 
> >> Looking at a couple of public datasets I don't think our treatment 
> >> effect (as indicated by the Pearson) is that unusual, it's 
> just that 
> >> we have the statistical power to detect the changes. Also 
> looking at 
> >> the changes, and considering the biology it seems 
> reasonable to get 
> >> these changes.
> 
> It depends a lot on what the factors are. Robert has some 
> collaborators who use treatments that greatly change things, 
> and others that use treatments that are so specific that the 
> number of changes genes is under 10. No global statement that 
> can be made here. For the former, they need to be warned that 
> their experiment lies outside of the currently available 
> technology, and then you do the best you can. With the 
> latter, we should be able to do a good job, although one may 
> never find the signal if p-value corrections are applied in a 
> naive fashion (but that is a different story).
> 
> >> In the discussions on RMA/GCRMA there are 2 assumptions discussed 
> >> 1)few genes changing - obviously not 2)equal # up and down 
> - despite 
> >> the huge amount of changes there are only 20 more 
> transcripts going 
> >> up compared to down - so yes.
> 
> As I said above, I do not believe this is specific to RMA or 
> GCRMA, but rather a general problem for all normalization methods.
> 
> Also, these are sufficient conditions, but not necessary. 
> I.e. if they are fulfilled, (GC)RMA can be guaranteed to 
> work, but if they aren't, the results from (GC)RMA, or other 
> normalization methods, may still be valid to a sufficient degree!
> 
> >> I've also looked at a number of control genes and can't 
> find any real 
> >> bias, in fact there is quite a bit of (random?) variation, 
> so if you 
> >> normalised on a few of these then you may get strange results...
> 
>  I think the problem with using control genes is that there 
> are  typically few of them and they do not necessarily span 
> the range of  intensities so they provide a poor basis for 
> normalization. Although,  in the present case they may be 
> better than the alternatives.
> 
> >> Finally, I will at some point try separate GCRMAs and then 
> scaling. 
> >> If anyone has any scripts for mean, robust mean or median 
> scaling a 
> >> series of separate exprs sets then I'd appreciate it.
> 
> I hope that this is a rather simple use of lapply or similar 
> (depending on how you have your exprSets stored).
> 
>   Regards,
>     Robert, Wolfgang
> 
>