[BioC] Maybe a usefull for limma users

Fri Dec 17 22:59:06 CET 2004

> Date: Fri, 17 Dec 2004 10:35:24 +0100
> From: "Dr. Ir. B. van Breukelen" <b.vanbreukelen at bio.uu.nl>
> Subject: [BioC] Maybe a usefull for limma users
> To: <bioconductor at stat.math.ethz.ch>
> Message-ID: <SOLIS1023Z4d5htAyD70000053a at solis102.soliscom.uu.nl>
> Content-Type: text/plain;	charset="us-ascii"
>
> Dear Limma users,
>
> In our lab we print our own microarray slides. We print both dedicated and
> whole genome arrays. With dedicated arrays there is the possibility to find
> large subsets of data which is up or down regulated, rendering it difficult
> to normalize this data with loess or printtip loess. Recently we have a
> slide containing tags for plant as well as pathogen together with spikes and
> other control spots. It now happens that if you have a time course
> experiment with the pathogen on the plants you get a very large subset of
> genes that seem to be upregulated however this is is not true. This is
> because in our control (not infected plants) there is almost no RNA present
> of our pathogen. In time this pathogen grows so relatively more pathogen RNA
> is produced. To follow these gene products trough time I needed a way to
> normalize on a subset of genes (pathogen) only. If have written a "little"
> function and extension for the normalization function that can be used in
> limma to render it possible to normalize on a subset of genes only. The
> first function is called flags()
> RG <- flags(RG, "regex")  where regex is eg. "genesX|genesY|genesZ" The
> regex uses the RG$genes$Name column and searches for names similar to the
> ones in your regex list.  e.g conrol1, control2 can be regex as "con"
> After this I added to the normalizeWithinArrays() function an extra option:
> MA<-normalizeWithinArrays(RG, method="subsetloess")
> Basically the method flags gives all selected genes a 1 and the others a 0
> than it uses this as weights in the normalization (loess). If you woul like
> to also use the spotweights themselves than you can multiply the RG$flags
> column with the weights column. After normalization you'll see that only
> your subset of genes have been used to calculate the loess fit. In our case
> this works very well. I have validated my data against other experiments in
> which we compensate for the mRNA level difference (which is time consuming)
> and the results are similar. I fit the data lmFit only on the genes of
> interest and voila great results.
>
> If you would like to know more or see some scatterplots I'm happy to e-mail
> them.
> I'm well aware that other possibilities for normalization exists however
> this function is simple and fast (at least for us) and therefore it might be
> usefull for others too.
>
> Yours,
>
> Bas van Breukelen

Thanks for your suggestion.  I definitely agree with you that normalizing on a subset of control
genes can be a very useful technique.  For some reason it an idea which seems to have been slow to
catch on.

Another way to do what you want using existing limma functions would be (assuming RG to be an
RGList and 'flags' to be a vector of 0s and 1s)

w <- array(flags, dim(RG))
MA <- normalizeWithinArrays(RG, method="loess", weights=w)

I think this is equivalent to your code below.  Alternatively one could use modifyWeights() to
alter existing quality weights in RG rather than overwrite them, as on page 15 of the Limma User's
Guide.

Gordon

> === functions ===
>
> normalizeWithinArrays
>
> // the whole function is long... only the last part containing the addition
> is shown here.
> }, robustspline = {
>         if (is.null(layout))
>             stop("Layout argument not specified")
>         for (j in 1:narrays) object$M[, j] <-
> normalizeRobustSpline(object$M[,
>             j], object$A[, j], layout, df = df, method = robust)
>     }, subsetloess = {
> 	controlspots = TRUE
> 	for (j in 1:narrays){
> 		y <- object$M[,j]
> 		x <- object$A[,j]
> 		w <- object$flags
> 		object$M[,j] <- loessFit(y,x,w, span=span, iterations =
> iterations)$residuals
> 	}
>     })
>     object
> }
>
> flags <- function (object, regex)
> {
> object$flags <- regexpr(regex, as.vector(object$genes[,6]))
> rows <- NROW(object$flags)
>
> 	for (j in 1:rows){
> 		if (object$flags[j] < 0){
> 			object$flags[j] <- 0
> 		}else{
> 			object$flags[j] <- 1
> 		}
> 	}
> 	return(object)
> }
>
> ===========
>
> =================================
> Dr. Ir. B. van Breukelen
> PostDoc, Bioinformatics, Molecular genetics
> Dept. of Biology.
> Room N407: H.R. Kruytgebouw
> Padualaan 8
> 3574 CH Utrecht
>
> Tel:       +31(0)30 253 3355
> Mobile: +31(0)6 24 996046
> e-mail:  b.vanbreukelen at bio.uu.nl
> website: http://genomics.bio.uu.nl/