[BioC] GCRMA/RMA bimodal distribution

Naomi Altman naomi at stat.psu.edu
Tue Aug 31 22:13:54 CEST 2004


I have used RMA and MAS on ATH arrays, and the distributions are bimodal 
(both probe-wise and probesets.)  Setting a p-value threshold at about .05 
(MAS) removes the lower peak.  But, like others on this list, I do not 
really take the p-values too seriously.

I am not sure why I should care about the bimodality.  The methods I use 
like t-tests and limma require normality within genes across arrays, and 
(possibly) a distribution for the variance of the genes, but say nothing 
otherwise about the distribution of genes on the same array.

--Naomi

At 06:06 PM 8/31/2004 +0200, Matthew  Hannah wrote:
>Hi,
>
>Sorry for including the developers, but I guess you are the only ones
>that will be able to answer this, (and I'm not sure BioC accepts .docs).
>I saw a comment from Jean addressing the same question but couldn't find
>the reply he referred to.
>
>https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-August/005769.
>html
>
>It seems the mouse chip exprs values have a double peak after gcrma
>(looking at a density plot).
>
>As I'd received no response I've been doing some investigating (see
>attached). Basically gcrma gives a single peaked distribution only for
>U95 human chips (optimised with these?). Double peaks for exprs
>estimates appear in the following - U133A(least) - Drosgenome1 - ATH1
>(worst).
>
>To a lesser extent this also occurs with RMA. U133A has a single wide
>peak, and then they get worse in the order Dros1 - U95 - ATH1 (The last
>two have obvious double peaks).
>
> >From what has been said this is likely to be a problem of BG correction.
>I don't know if there are opportunities to change this for RMA, but in
>GCRMA there are tuning factors and I don't know if the ad-hoc estimate
>(rather than full model) is causing this to happen. Turning of optical
>correct had no effect.
>
>I wanted to play about with GCRMA to see if the distribution changed
>with the tuning factors but currently I seem to have an error (see
>below) with gcrma and justGCRMA not finding gcrma.bg.transformation, and
>I'm not sure how k should be expressed.
>
>I know people should look more at their data but with the ease of
>just(GC)RMA and RMAexpress I know a lot of people just computing
>expression measures for different chip types without looking at density
>of the returned expression. Clearly these people are going to be working
>with data that may be skewed in some way.
>
>I guess that each chip type will need its BG correction optimising for
>RMA and GCRMA to allow for a better estimate of true expression levels
>and changes. I really hope this can be fixed as RMA and GCRMA seem to be
>really useful expression measures and it would be a shame to have to
>find alternative methods just because they are not optimised for your
>chip type.
>
>Thanks in advance,
>Matt
>
>R devel 2.0, win2k
>affy 1.5.2 (I know it's not the latest but getBioC is not working for me
>at the moment)
>gcrma 1.1.0
>  <<Exprs_meas_comp.doc>>
>
> > esetgcrma_slow <- gcrma(raw,fast=FALSE)
>Computing affinities.Done.
>Adjusting for optical effect.........Done.
>Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[,
>i], mms[, i], pm.affinities, mm.affinities,  :
>         couldn't find function "gcrma.bg.transformation"
> > esetgcrma_slow <- justGCRMA(fast=FALSE)
>Computing affinities..Done.
>Adjusting for optical effect..........Done.
>Adjusting for non-specific binding.Error in bg.adjust.fullmodel(pms[,
>i], mms[, i], pm.affinities, mm.affinities,  :
>         couldn't find function "gcrma.bg.transformation"
> > esetgcrma_k4 <- justGCRMA(k=4*fast+0.5*(1-fast))
>Computing affinities..Done.
>Adjusting for optical effect..........Done.
>Adjusting for non-specific binding.Error in
>gcrma.bg.transformation.fast(pms, bhat, var.y, k = k) :
>         Object "fast" not found
>
>
>
>
>
>
>
>
>
>
>
>
>
>Hi,
>
>This has been mentioned before in the context of rma and that it was an
>artifact of BG correction.
>
>http://files.protsuggest.org/biocond/html/5066.html
>
>I was very suprised to see that gcrma also gave a very pronouned bimodal
>distribution. When comparing samples, obviously the relative positions
>of the 2 peaks may influence observed expression changes. Would such
>peak shifts be more likely in divergent samples, and if anyone wants to
>comment on those.... ;-)
>
>This example is using 12 chips (biological reps). But I initially
>noticed it using 3 and 6 chips in rma.
>
>Hope attachment works.
>
>Cheers,
>Matt
>
>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: gcrma_dist.png
>Type: image/png
>Size: 6633 bytes
>Desc: gcrma_dist.png
>Url :
>https://stat.ethz.ch/pipermail/bioconductor/attachments/20040825/083e56a
>5/gcrma_dist.png
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list