[BioC] Maybe a usefull for limma users

Fri Dec 17 10:35:24 CET 2004

Dear Limma users,

In our lab we print our own microarray slides. We print both dedicated and
whole genome arrays. With dedicated arrays there is the possibility to find
large subsets of data which is up or down regulated, rendering it difficult
to normalize this data with loess or printtip loess. Recently we have a
slide containing tags for plant as well as pathogen together with spikes and
other control spots. It now happens that if you have a time course
experiment with the pathogen on the plants you get a very large subset of
genes that seem to be upregulated however this is is not true. This is
because in our control (not infected plants) there is almost no RNA present
of our pathogen. In time this pathogen grows so relatively more pathogen RNA
is produced. To follow these gene products trough time I needed a way to
normalize on a subset of genes (pathogen) only. If have written a "little"
function and extension for the normalization function that can be used in
limma to render it possible to normalize on a subset of genes only. The
first function is called flags()
RG <- flags(RG, "regex")  where regex is eg. "genesX|genesY|genesZ" The
regex uses the RG$genes$Name column and searches for names similar to the
ones in your regex list.  e.g conrol1, control2 can be regex as "con"
After this I added to the normalizeWithinArrays() function an extra option:
MA<-normalizeWithinArrays(RG, method="subsetloess")
Basically the method flags gives all selected genes a 1 and the others a 0
than it uses this as weights in the normalization (loess). If you woul like
to also use the spotweights themselves than you can multiply the RG$flags
column with the weights column. After normalization you'll see that only
your subset of genes have been used to calculate the loess fit. In our case
this works very well. I have validated my data against other experiments in
which we compensate for the mRNA level difference (which is time consuming)
and the results are similar. I fit the data lmFit only on the genes of
interest and voila great results.

If you would like to know more or see some scatterplots I'm happy to e-mail
them. 
I'm well aware that other possibilities for normalization exists however
this function is simple and fast (at least for us) and therefore it might be
usefull for others too.

Yours,

Bas van Breukelen

=== functions ===

normalizeWithinArrays

// the whole function is long... only the last part containing the addition
is shown here.
}, robustspline = {
        if (is.null(layout)) 
            stop("Layout argument not specified")
        for (j in 1:narrays) object$M[, j] <-
normalizeRobustSpline(object$M[, 
            j], object$A[, j], layout, df = df, method = robust)
    }, subsetloess = {
	controlspots = TRUE
	for (j in 1:narrays){
		y <- object$M[,j]
		x <- object$A[,j]
		w <- object$flags
		object$M[,j] <- loessFit(y,x,w, span=span, iterations =
iterations)$residuals
	}
    })
    object
}

flags <- function (object, regex) 
{
object$flags <- regexpr(regex, as.vector(object$genes[,6]))
rows <- NROW(object$flags)

	for (j in 1:rows){
		if (object$flags[j] < 0){
			object$flags[j] <- 0
		}else{
			object$flags[j] <- 1	
		}
	}
	return(object)
}

===========

=================================
Dr. Ir. B. van Breukelen
PostDoc, Bioinformatics, Molecular genetics
Dept. of Biology. 
Room N407: H.R. Kruytgebouw
Padualaan 8
3574 CH Utrecht

Tel:       +31(0)30 253 3355
Mobile: +31(0)6 24 996046
e-mail:  b.vanbreukelen at bio.uu.nl
website: http://genomics.bio.uu.nl/