[Bioc-devel] [BioC] Peculiar behaviour of normalize.quantiles in affy, preprocessCore) if there are NA data

Gordon Smyth smyth at wehi.EDU.AU
Fri Jul 13 04:25:43 CEST 2007

Hi Ben,

My recollection is that normalize.quantiles() doesn't do quite the 
right think with ties when the number of ties is even, see


I thought Seth and Wolfgang's concern was more with NAs though.


>Date: Thu, 12 Jul 2007 16:47:31 -0700 (PDT)
>From: bmb at bmbolstad.com
>Subject: Re: [Bioc-devel] Bioc-devel Digest, Vol 40, Issue 3
>To: "Gordon Smyth" <smyth at wehi.EDU.AU>
>Cc: bioc-devel at stat.math.ethz.ch
>Note that the current C code does appropriately handle ties (depending on
>your definition of appropriate) and has done so for a long time (over 5
>Cut from code comment:
>" ** Apr 19, 2002 - Update to deal more correctly with ties (equal rank)"
> >>Hi Seth & Ben,
> >>
> >>thanks for your clarifying comments!
> >>
> >> > [moved to bioc-devel, where this should have started I think]
> >>
> >>Sorry if I have been stepping on feet... the reason for posting to the
> >>bioc user list was that more than once I have (sadly) seen people
> >>looking at histogrammes such as that of qx shown in my previous post,
> >>and using the suggested "cutoff" e.g. to discriminate between expressed
> >>and un-expressed genes, and the like. I hope that this does not sound to
> >>presumptuous, but I think it is a good thing to educate users to
> >>critically assess such results.
> >>
> >>Btw, normalizeQuantiles from the limma package appears to deal with NA
> >>values more gracefully (but it is written in R, hence slower). I think
> >>it assumes that the missingness mechanism is random.
> >
> > Yes it does.
> >
> > The reason the R version is a bit slower than C is mainly because of
> > the need to handle NAs and to treat ties carefully. Without these
> > considerations, the R implementation is nearly as fast. Try
> > normalizeQuantiles(ties=FALSE) for more speed.
> >
> > Regards
> > Gordon

More information about the Bioc-devel mailing list