[Bioc-devel] [BioC] Peculiar behaviour of normalize.quantiles in affy, preprocessCore) if there are NA data

Fri Jul 13 01:11:10 CEST 2007

>Date: Wed, 11 Jul 2007 07:04:48 -0700
>From: Seth Falcon <sfalcon at fhcrc.org>
>Subject: Re: [Bioc-devel] [BioC] Peculiar behaviour of
>         normalize.quantiles     (in affy, preprocessCore) if there 
> are NA data
>To: Ben Bolstad <bmb at bmbolstad.com>
>Cc: bioc-devel <bioc-devel at stat.math.ethz.ch>
>
>[moved to bioc-devel, where this should have started I think]
>
>Ben Bolstad <bmb at bmbolstad.com> writes:
>
> > Wolfgang,
> >
> > The code in preprocessCore for quantile normalization shows its legacy
> > being that it was developed around probe-level Affymetrix data straight
> > from CEL files where NA values are not to be expected. There may or may
> > not be comments to that effect in the C code documentation (actually
> > there is further down in the qnorm.c file for a slight variation on the
> > implementation).
> >
> > If you are willing to make the assumption that the missing data
> > mechanism is "missing at random" then I think the fix is fairly trivial,
> > just estimate the distribution using the non-missing data. If it is
> > instead driven by say a truncation mechanism a different fix would be
> > needed.
> >
> > In either case I don't think the current situation is desirable and
> > should be fixed.
>
>How about:
>
>1. Let's add code to check for and raise an error if any NA's are
>    found.  This should be easy and can be done quickly.
>
>2. Then we could consider adding an argument that allows NA's and
>    handles things under the missing at random assumption, along with
>    documentation.
>
>+ seth

As noted later by Wolfgang, the normalizeQuantiles() function in 
limma does exactly this.

Regards
Gordon