[BioC] Re: Limma: four suggestions related to weights and background correction

Gordon Smyth smyth at wehi.edu.au
Mon Feb 9 07:35:28 MET 2004


At 03:05 AM 24/01/2004, Ramon Diaz-Uriarte wrote:
>Dear Gordon and other bioconductor-users,
>
>We have encountered a few recurrent problem when using limma. If I may, I'd
>like to make four suggestions for changes which might benefit other people:
>
>
>1. When "backgroundCorrect(temp1, method = "none")" is used columns for Rb 
>and
>Gb are eliminated from the RGList object. We have found this problematic,
>because, even if one does not want to do background subtraction, one might
>still want to look at image plots of background, just to check nothing funny
>has happened with the background. The problem is that the subsequent step of
>using "normalizeWithinArrays" calls MA.RG, which itself uses the presence or
>absence of the Rb and Gb to do (or not) the subtraction. Then, if we still
>want to plot background and do the normalization, we either have to make sure
>we are done with all background business before we do the normalization
>(since after that all background info will be lost) or we need to keep the
>normalized and the unnormalized objects around. None of these seem ideal. We
>wonder if it would be possible to pass the type of subtraction to
>"normalizeWithinArrays" and to "MA.RG" as a parameter, so that the original
>object preservers all the information, but still we can do no background
>subtraction. (We are hacking the code here, for our internal use).

It is certainly a possibility that we will record the type of background 
correction in the future, probably simply as a character string 
corresponding to the 'method' argument of backgroundCorrect(). I don't want 
to store background intensities values as part of normalized data objects 
though. In this sense I think one should be done with background correction 
by the time you do the normalization.

I am not really sure why you need to hack the code. Why not simply use 
something like

MA <- normalizeWithinArrays(backgroundCorrect(RG,method="none"))

In this way, you have normalized without background correcting, but at the 
same time the background values are preserved in the RGList object.

>2. We have found it slightly confussing that the M value of a point with
>weight 0 for the normalization is not an NA. Sure, the points with weight = 0
>do not get used to fit the loess curve, but they still have a residual. But,
>so far, with the wet lab people we have worked, it has always been the case
>that if a point was deemed unsuitable for the normalization it was also
>unsuitable for further analyses (though, occasionally, points suitable for
>fitting the normalization curve have been deemed unsuitable for further
>analyses).
>
>Are we missing something obvious here?

weights=0 are definitely not the same as missing values. Consider the 
following application for example: suppose you have a set of control spots 
and you want to normalize all the data to the loess curve going through the 
control spots. You can simply set the weights=1 for the control spots and 
zero otherwise to achieve this effect. Introducing NAs would destroy your 
data of interest.

I am against introducing NAs as a general rule unless the data really is 
missing or is undefined in some intrinsic sense.

>3. Occasionally, a complet print-tip group might be missing which will make
>"normalizeWithinArrays" to fail. We have added the following to that
>function, in the print-tip loess case, right before the line "spots <- spots
>+ nspots" (and substituting the direct call to loessFit):
>
>******************************************
>num.fill <- length(object$M[spots, j])
>object$M[spots, j] <- rep(NA, num.fill)
>no.objects <-
>     try(object$M[spots, j]
>         <- loessFit(y, x, w, span = span,
>                     iterations = iterations)$residuals)
>if(class(no.objects) == "try-error")
>     print(paste("WARNING: in print-tip loess normalization",
>                 "no samples in array", j, "grid row", gridr,
>                 "grid column", gridc))
>********************************************

That's a good suggestion. I would note though that if this phenomenon 
occurs for you, you probably shouldn't be using printtiploess normalization 
in the first place - you should use plain "loess" instead.

>4. When all weights are either 0 or 1, there are slight differences in the
>results of the normalization (of points with weight = 1) depending on whether
>or not the cases with weight of 0 are passed to subsequent calls. Sure, these
>differences are irrelevant, but it can be disconcerting. Wouldn't it be
>appropriate to check for the binary situation of weights either 0 or 1, and
>if so, exclude points with weight 0, which would allow us to call directly
>".C("lowess"", instead of ".vsimpleLoess" (inside "loessFit")?

There is no perfect solution really. The change that you suggest would mean 
that very small weights would become discontinuously different from zero 
weights. The real solution would be to have a lowess function which accepts 
weights and which interpolates based on x-value distances, but such a 
function isn't available. I am inclined to leave as is.

Thanks for all your suggestions.
Gordon

>***************
>The following is an example.
>(The data sets are at:
>http://bioinfo.cnio.es/~rdiaz/00G66.txt)
>
>
>temp01 <- read.maimages(files = "00G66.txt",
>                        source = "genepix",
>                        wt.fun=wtflags(0))
>
>temp01.b <- temp01
>set.to.na <- which(temp01.b$weights == 0)
>temp01.b$R[set.to.na] <- NA
>temp01.b$G[set.to.na] <- NA
>
>temp01.c <- temp01.b
>temp01.c$weights <- NULL
>
>n1c <- normalizeWithinArrays(temp01.c, layout = temp.layout,
>                              iterations = 5,
>                              method = "loess")
>n1b <- normalizeWithinArrays(temp01.b, layout = temp.layout,
>                              iterations = 5,
>                              method = "loess")
>n1 <- normalizeWithinArrays(temp01, layout = temp.layout,
>                              iterations = 5,
>                             method = "loess")
>
>pairs(cbind(n1$M, n1b$M, n1c$M))
>summary(n1$M - n1b$M)
>
>
>******************
>
>Best,
>
>R.
>
>--
>Ramón Díaz-Uriarte
>Bioinformatics Unit
>Centro Nacional de Investigaciones Oncológicas (CNIO)
>(Spanish National Cancer Center)
>Melchor Fernández Almagro, 3
>28029 Madrid (Spain)
>Fax: +-34-91-224-6972
>Phone: +-34-91-224-6900
>
>http://bioinfo.cnio.es/~rdiaz
>PGP KeyID: 0xE89B3462
>(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)



More information about the Bioconductor mailing list