[BioC] missing values in limma/contrasts.fit

Tue Dec 15 10:40:09 CET 2009

Dear Albyn,

On Monday 14 December 2009 20:15:05 Albyn Jones wrote:
> Dear BioConductor Folk
> 
> The help file for contrasts.fit states:
> 
>      "Warning. For efficiency reasons, this function does not
>      re-factorize the design matrix for each probe. A consequence is
>      that, if the design matrix is non-orthogonal and the original fit
>      included quality weights or missing values, then the unscaled
>      standard deviations produced by this function are approximate
>      rather than exact. The approximation is usually acceptable...."
> 
> My attention was attracted to the statement when a colleague in
> biology asked me why one would get different sets of probes identified
> as differentially expressed, depending on which individual or
> biological sample was selected as the reference in a balanced loop
> design.
> 
> My experience, admittedly limited, suggests that the computational
> efficiency gain is not worth the loss of accuracy.  Even if one has to
> sacrifice the efficiency of a single pass through the raw data, at
> least one gets correct results.  I have hacked a version of lmFit to
> evaluate contrasts with standard errors based on the exact covariance
> matrix.  It runs esssentially as quickly as lmFit, so I find the
> efficiency argument uncompelling.
> 
> A search of the archive produced several discussions of missing values
> in limma.  The main argument I see is Gordon Smyth's (Date: 2008-03-08)
> 
>    "The ideal solution is not to introduce missing values into your
>     data in the first place.  In my experimence, missing values are
>     almost always avoidable.  I have never seen a situation where it
>     was necessary or desirable to introduce a large proportion of
>     missing values."
> 
> My colleagues in biology report that they inspect their arrays
> visually and note probes which have been scratched, probes covered by
> background blobs and the like.  These categories seem to satisfy the

That is exactly the same here with many of my colleagues. 

> missing-at-random criterion: the probe is marked NA not because it is
> saturated or below background, but because it was unreadable for
> reasons unrelated to the response.

Yes. And those NAs are, then, truly NAs. Even if they were not CMAR or MAR, 
they are NAs nonetheless.

However, it is also the case that these "true NAs" are only a very minor 
fraction of the total number of points.

> 
> I'd appreciate feedback: has anyone else already done this? Would
> others find this useful?  Are there objections I have overlooked?
> 

Yes, I'd find it useful.

Best,

R.

> albyn
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>  http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Ramon Diaz-Uriarte
Biocomputing Programme
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y ...{{dropped:3}}