[Bioc-devel] [BioC] Peculiar behaviour of normalize.quantiles in affy, preprocessCore) if there are NA data

Gordon Smyth smyth at wehi.EDU.AU
Fri Jul 13 04:20:27 CEST 2007

Hi Ben,

Mmm. You're asking me if I'd consider making limma depend on 
preprocessCore, which I know nothing about yet, to use a function 
hasn't been fixed yet. I think a deep breath might be appropriate here. :)

All I know about this whole thread is the two emails I saw today from 
Seth and Wolfgang. I have other things to worry about at the moment, 
and trust that you have it in hand.

To answer your question, of course I'll consider using core tools. My 
policy has always been to make use of software which works well and 
does the job, as for example I make intensive use of your affy 
package. In my experience, the best way to get your software used is 
to make it useful to people. Attempting to police people's use is 
futile, and has the whole software development process around the wrong way.

Don't forget that the function normalizeQuantiles() has been around 
for more than 5 years, and we have corresponded on NAs and ties long ago.


At 09:36 AM 13/07/2007, bmb at bmbolstad.com wrote:
>Is there any chance that you'd consider having limma depend on a "fixed"
>version in preprocessCore rather than having your own separate code?
>Note, I am not trying to pick on you specifically since I know there are a
>number of other quantile normalization implementations in various
>packages. Additionally my personal stance has always been that I am not
>going to play policeman on this issue and developers are free to make
>there own choices.
>In any case, addressing this issue is push to the top of my stack (likely
>this upcoming weekend).
> >>Ben Bolstad <bmb at bmbolstad.com> writes:
> >>
> >> > Wolfgang,
> >> >
> >> > The code in preprocessCore for quantile normalization shows its legacy
> >> > being that it was developed around probe-level Affymetrix data
> >> straight
> >> > from CEL files where NA values are not to be expected. There may or
> >> may
> >> > not be comments to that effect in the C code documentation (actually
> >> > there is further down in the qnorm.c file for a slight variation on
> >> the
> >> > implementation).
> >> >
> >> > If you are willing to make the assumption that the missing data
> >> > mechanism is "missing at random" then I think the fix is fairly
> >> trivial,
> >> > just estimate the distribution using the non-missing data. If it is
> >> > instead driven by say a truncation mechanism a different fix would be
> >> > needed.
> >> >
> >> > In either case I don't think the current situation is desirable and
> >> > should be fixed.
> >>
> >>How about:
> >>
> >>1. Let's add code to check for and raise an error if any NA's are
> >>    found.  This should be easy and can be done quickly.
> >>
> >>2. Then we could consider adding an argument that allows NA's and
> >>    handles things under the missing at random assumption, along with
> >>    documentation.
> >>
> >>+ seth
> >
> > As noted later by Wolfgang, the normalizeQuantiles() function in
> > limma does exactly this.
> >
> > Regards
> > Gordon

More information about the Bioc-devel mailing list