[BioC] normalization data with ..txt or ..xls file by marray or limma

Fri Mar 5 03:58:38 MET 2004

Hi Darwin,

>Once I get my cDNA microarray data, when should I delete the poor quality spots, before the normalization or after the normalization. I think it needs to be done before the normalization. In such case, what rule should I use?  
>
A good question, and not an easy one to answer.  Firstly defining poor 
quality spots can be done in many ways (there are a number of papers on 
the subject, which I can send you the references for if you're 
interested).  Most involve coming up with a spot specific measure, and 
filtering (removing) genes with an unfavourable value of this measure 
from subsequent analysis.

In limma, spot quality weights can be used in the normalization and 
linear models to do this.  Log-ratios from spots which are assigned low 
weights (close to 0) have less influence in the normalization and linear 
model fit compared to spots with high weights (around 1).  Spots with 0 
weights are ignored.  

These relative weights can be automatically determined from data coming 
out of the image analysis programs Spot and GenePix.  The weights for 
Spot are based on the ideal spot size (spots smaller and larger than 
ideal are down-weighted), and for GenePix, they are derived from the 
quality flags (good spot - 0 flag, full weight, bad spot - negative 
flag, low weight).  Specifying the 'weights' argument in 
normalizeWithinArrays() and lmFit() makes use of the weights in the 
normalization and linear model analysis.  At the end of this message is 
an example which might be helpful.

Does the image analysis package you're using provide any quality flags 
that you might be able to use?

Sorry I don't have a more definite answer to your question.  Best wishes,

Matt Ritchie

# Set up a random dataset of 6 replicate arrays with 100 genes on each array
RG <- new("RGList", list(R=matrix(rnorm(100*6, 1000, 300), 100, 6), 
G=matrix(rnorm(100*6, 1000, 300), 100, 6), Rb=NULL, Gb=NULL))
RG$printer <- list(nspot.r=5, nspot.c=4, ngrid.r=1, ngrid.c=5)    # 
specify the array grid layout

RG$weights <-  matrix(1, 100, 6)    # define the weights.  All spots are 
given full weight (1), except
RG$weights[1,] <- 0                    # for the observations for gene 1 
(deemed to be poor quality)
RG$weights[,1] <- 0                # and the observations from array 1 
(bad array)

# spots with 0 weights (from array 1, and gene 1 in this example are 
ignored in the normalization and linear model fit
MA <- normalizeWithinArrays(RG, weights=RG$weights)   
fit <- lmFit(MA, weights=RG$weights)
fit <- eBayes(fit)

>Thanks in advance!
>
>Darwin
>