[BioC] RMA-bimodality:

Wed Jun 7 00:06:26 CEST 2006

Hi, Wofgang,

Yes, mRNA abundances do indeed span the whole range. What I meant was that
all the distributions of intensities I have observed seem to be poorly
modeled by a single mathematical distribution (that is what I meant by my
poor choice of the term "truly unimodal"). Rather, two overlaid (added)
distributions seem to model the observed data better, with the first
distribution (presumably "absent") spanning the lower part of the range, and
the second ("present", presumably modeling the mRNA abundances) spanning the
*entire* range, but with a higher mean. The lower distribution would
represent the much larger set of probes whose intensities are due only to
cross-hyb, NSB, and background, with no true target mRNA signal. Its density
peak is therefore much higher than the other. Although there is significant
overlap between the two, the two means are separated and distinct, so the
sum is a bimodal distribution. 

Again, this is simply based on my observations of log2-transformed intensity
values. In fact, isn't it the main purpose of any of the intensity
processing methods (MAS, RMA, GCRMA, etc.) to detect and increase the
difference between the two distributions, so as to help distinguish signal
from noise?

Regards,
- Peter

> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
> Sent: Tuesday, June 06, 2006 12:15 PM
> To: Peter G. Warren
> Cc: noel0925 at sbcglobal.net; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] RMA-bimodality:
> 
> Hi Peter,
> 
> - doesn't the distribution of mRNA abundances (i.e. physical 
> concentrations measured e.g. in average no. of molecules per 
> cell) span the whole range from just undetectably above zero 
> to very large? I am not sure what mechanism would then result 
> in two distinct peaks of fluorescences, one for "absent" and 
> and one for "present" mRNAs.
> 
> - I tried find a definition of "truely unimodal 
> distributions" (and I suppose, "falsely unimodal 
> distributions"), but couldn't find one, can you advise?
> 
> Cheers
>  Wolfgang
> 
> Peter G. Warren wrote:
> > Hi, Wolfgang, Noel,
> > 
> > It is true that a non-linear transformation can change the 
> number of 
> > nodes of the data, and that that transformation can be 
> sufficient to 
> > explain the bimodality we see in background-corrected data. 
> However, 
> > in my experience, the raw probe-level data is itself bimodal. When 
> > there is some real signal present, the probe-level intensities are 
> > actually from two different distributions. The first ("absent") is 
> > where there is no positive transcript binding, only cross-hyb, 
> > non-specific binding, and background. The second
> > ("present") is all that, plus true target transcript binding. This 
> > bimodality is more evident with log-transformed values. (In 
> contrast, 
> > a log-transformation of a truly unimodal distribution, such as 
> > density(rnorm(...), is still unimodal.) In every case I've 
> looked at, 
> > the "absent" distribution dwarfs the "present" one, so it 
> often looks 
> > like one mode, before log transformation. After log 
> transformation, I 
> > have been unable to model the data successfully with a single 
> > distribution; it always takes two.
> > 
> > Regards,
> > - Peter Warren
> > 
> >> Hi Noel,
> >>
> >>> Just so that I am clear- the point is that the bimodality 
> is not an 
> >>> artifact of the convolution, but simply the fact that the 
> number of 
> >>> modes of a distribution is not conserved under monotonous 
> >>> transformations.
> >> No, I did not say that, and I do not know how to understand this 
> >> sentence, since "the convolution" is directly related to "the 
> >> monotonous transformation" that we are talking about
> >>
> >>> This is why the paper points to the
> >>> fact that the histograms of log2 (PMs/MMs) stratified by 
> log2(PMs) 
> >>> is bimodal
> >> I leave the exegesis of the paper to its authors.
> >>
> >>> -so bimodality is a more
> >>> general property of the probe level data.
> >> As you have just said yourself, the number of modes is not 
> a property 
> >> of the data, but of the data plus the particular (non-linear) 
> >> transformation that you choose to apply to them.
> >>
> >>
> >> Best wishes
> >>  Wolfgang.
> > 
> 
> 
> --
> ------------------------------------------------------------------
> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
> 
>