[BioC] Re: Limma: four suggestions related to weights and background correction

Ramon Diaz-Uriarte rdiaz at cnio.es
Mon Feb 9 09:28:27 MET 2004


Dear Gordon,

Thank you very much for your comments. A few answers:

>
> It is certainly a possibility that we will record the type of background
> correction in the future, probably simply as a character string
> corresponding to the 'method' argument of backgroundCorrect(). I don't want
> to store background intensities values as part of normalized data objects
> though. In this sense I think one should be done with background correction
> by the time you do the normalization.
>
> I am not really sure why you need to hack the code. Why not simply use
> something like
>
> MA <- normalizeWithinArrays(backgroundCorrect(RG,method="none"))
>
> In this way, you have normalized without background correcting, but at the
> same time the background values are preserved in the RGList object.

Yes but, because of the way we have structured our code (which we'll probably 
need to change), in addition to the normalization call, we also call MA.RG 
for some diagnostic plots of foreground and background intensities. 
Thus, we need to keep the backgroun data around, even if the non-transformed M 
is obtained without background subtraction (and plotted as such).
I am not sure if this situation is common to other people, or if it is just an 
idiosyncrasy.


>
> >2. We have found it slightly confussing that the M value of a point with
> >weight 0 for the normalization is not an NA. Sure, the points with weight
> > = 0 do not get used to fit the loess curve, but they still have a
> > residual. But, so far, with the wet lab people we have worked, it has
> > always been the case that if a point was deemed unsuitable for the
> > normalization it was also unsuitable for further analyses (though,
> > occasionally, points suitable for fitting the normalization curve have
> > been deemed unsuitable for further analyses).
> >
> >Are we missing something obvious here?
>
> weights=0 are definitely not the same as missing values. Consider the
> following application for example: suppose you have a set of control spots
> and you want to normalize all the data to the loess curve going through the
> control spots. You can simply set the weights=1 for the control spots and
> zero otherwise to achieve this effect. Introducing NAs would destroy your
> data of interest.
>
> I am against introducing NAs as a general rule unless the data really is
> missing or is undefined in some intrinsic sense.


Thank you for your comments; I was completely overlooking that possibility (a 
"local effect" because we rarely have true, legitimate control spots in the 
arrays we often handle here).


>
> >3. Occasionally, a complet print-tip group might be missing which will
> > make "normalizeWithinArrays" to fail. We have added the following to that
> > function, in the print-tip loess case, right before the line "spots <-
> > spots + nspots" (and substituting the direct call to loessFit):
> >
> >******************************************
> >num.fill <- length(object$M[spots, j])
> >object$M[spots, j] <- rep(NA, num.fill)
> >no.objects <-
> >     try(object$M[spots, j]
> >         <- loessFit(y, x, w, span = span,
> >                     iterations = iterations)$residuals)
> >if(class(no.objects) == "try-error")
> >     print(paste("WARNING: in print-tip loess normalization",
> >                 "no samples in array", j, "grid row", gridr,
> >                 "grid column", gridc))
> >********************************************
>
> That's a good suggestion. I would note though that if this phenomenon
> occurs for you, you probably shouldn't be using printtiploess normalization
> in the first place - you should use plain "loess" instead.


I didn't make the context clear: we are using this as part of a web interface 
for normalization, and having the normalization functions to return an error 
is undesirable. With the above modification the R run finishes, and the web 
application is happy, but the user is given a warning, letting her choose to 
re-do things otherwise.

However, regarding the type of normalization, I am not sure I see why, if this 
happens occasionally, the user should prefer global loess: if a print-tip 
group in a specific array has been damaged (e.g., a scratch), but the rest of 
the print-tip groups are OK, and the usual assumptions are believed to hold 
for each of the other print-tip groups, then I think print-tip loess is 
justified.


>
> >4. When all weights are either 0 or 1, there are slight differences in the
> >results of the normalization (of points with weight = 1) depending on
> > whether or not the cases with weight of 0 are passed to subsequent calls.
> > Sure, these differences are irrelevant, but it can be disconcerting.
> > Wouldn't it be appropriate to check for the binary situation of weights
> > either 0 or 1, and if so, exclude points with weight 0, which would allow
> > us to call directly ".C("lowess"", instead of ".vsimpleLoess" (inside
> > "loessFit")?
>
> There is no perfect solution really. The change that you suggest would mean
> that very small weights would become discontinuously different from zero
> weights. The real solution would be to have a lowess function which accepts
> weights and which interpolates based on x-value distances, but such a
> function isn't available. I am inclined to leave as is.


I see now that my suggestion was simplistic and naive.


>
> Thanks for all your suggestions.

Thanks for all your comments (and, of course, for the limma package in the 
first place!)


R.
> Gordon
>
> >***************
> >The following is an example.
> >(The data sets are at:
> >http://bioinfo.cnio.es/~rdiaz/00G66.txt)
> >
> >
> >temp01 <- read.maimages(files = "00G66.txt",
> >                        source = "genepix",
> >                        wt.fun=wtflags(0))
> >
> >temp01.b <- temp01
> >set.to.na <- which(temp01.b$weights == 0)
> >temp01.b$R[set.to.na] <- NA
> >temp01.b$G[set.to.na] <- NA
> >
> >temp01.c <- temp01.b
> >temp01.c$weights <- NULL
> >
> >n1c <- normalizeWithinArrays(temp01.c, layout = temp.layout,
> >                              iterations = 5,
> >                              method = "loess")
> >n1b <- normalizeWithinArrays(temp01.b, layout = temp.layout,
> >                              iterations = 5,
> >                              method = "loess")
> >n1 <- normalizeWithinArrays(temp01, layout = temp.layout,
> >                              iterations = 5,
> >                             method = "loess")
> >
> >pairs(cbind(n1$M, n1b$M, n1c$M))
> >summary(n1$M - n1b$M)
> >
> >
> >******************
> >
> >Best,
> >
> >R.
> >
> >--
> >Ramón Díaz-Uriarte
> >Bioinformatics Unit
> >Centro Nacional de Investigaciones Oncológicas (CNIO)
> >(Spanish National Cancer Center)
> >Melchor Fernández Almagro, 3
> >28029 Madrid (Spain)
> >Fax: +-34-91-224-6972
> >Phone: +-34-91-224-6900
> >
> >http://bioinfo.cnio.es/~rdiaz
> >PGP KeyID: 0xE89B3462
> >(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)



More information about the Bioconductor mailing list