[BioC] RMA-bimodality:

Mon Jun 5 18:57:36 CEST 2006

Hi,

I am surprised why anybody is surprised about the different number of
modes ("peaks"): the number of modes of a distribution is not conserved
under monotonous transformations (such as the background correction in
RMA), this simply follows from chain rule.

See below for a simple example with some "mock" microarray intensities z
and density of log-transformed values before and after a (primitive)
background background correction.

Cheers
 Wolfgang

set.seed(123)

n = 100000
z = 20 + exp(c(rnorm(n), 3+rnorm(n)))

par(mfrow=c(1,2))
plot(density(log2(z)))
plot(density(log2(z-20)))

noel0925 at sbcglobal.net wrote:
> In the paper: Exploration, Normalization and Summaries
> of High Density Oligonucleotide Array Probe Level Data
> the following statement regarding the
> bimodality of log2(PM) values and RMA background
> corrected PM values can be found- "The same bimodal
> effect is seen when we stratisfy by log2(PM), thus it
> is not an artifact of conditioning on sums." (p4).
> I am a little confused by this as I thought that
> indeed an artifact of the convolution!
> 
> Clearly, the background corrected intensity
> values are given by E(S | O) or the conditional
> expectation of the signal given what we observe; where
> the observed signal is the convolution of a normally
> distributed background (N) mean mu variance sigma^2
> (B~ N(u, Ïƒ^2)) and an exponentially distributed
> signal (S) with mean alpha (S~ exp(Î±)). 
> 
> There have been several postings regarding this matter
> in the Bioconductor archives and all seem to point to
> this. Have I misunderstood?
> 
> In particular was the following post:
> https://stat.ethz.ch/pipermail/bioconductor/2004-August/005908.html
> (See below the response from zwu at jhsph.edu 
> 
> The original question I got was about the bimodal
> distribution of gcrma
> result from probe intensities with unimodel
> distribution. My answer was
> that the "change" was not necessarily surprising.
> 
> For example , when you have "true log signal" from a
> bimodal distribution
> logS=c(rnorm(1000,3,1),rnorm(1000,8,2))
> # You will see this has two peaks
> par(mfrow=c(2,2))
> plot(density(logS))
> #if the background, log(non-specific binding) come
> from 
> logB=rnorm(2000,6,1)
> #then when you plot the histogram of convolution in
> log scale,
> plot(density(log(exp(logS)+exp(logB)))) 
> #you see only one peak, and this would be "before
> gcrma".
> 
> This explanation made sense to me, but seems to
> contradict what is stated in the paper.
> 
> Also, can someone explain the difference between RMA
> background version1 vs version2?
> 
> 
> Best regards,
> Noel
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber