[BioC] Removing control probes and treating technical replicates in Agilent one colour data

Wed Apr 16 16:22:26 CEST 2008

Dear all,

I am working with single channel Agilent arrays, and need some help in  
processing and the data correctly, using Limma. (I have followed the  
advice I found in the archives for using Limma for one colour Agilent  
data, by using dummy variables for the red channel.)

I started out weighting all the control probes and spots flagged as  
outliers by Feature Extraction to zero when loading the data. The  
differentially expressed genes I found downstream seemed logical. But  
then I understood that the control probes should not be weighted zero,  
but should be included in the normalisation procedure, and instead  
removed before statistical analysis to increase power. However, when I  
tried to subset the expression object in lmFit(), the downstream  
result was a very different and shorter genelist. Also, the second  
genelist includes some control probes (the first one doesn?t), which  
should not be possible, since all the control probes were supposed to  
be removed.

I have 38 arrays with 36 biological and 2 technical replicates. I?ve  
loaded the .txt files with RG <- read.maimages(). I normalized between  
arrays with vsn. I won?t go into the design and contrast matrices,  
because I don?t think it?s relevant for my question. I?ve tried to  
include the information from the technical replicates by using  
duplicateCorrelation() and defined which samples are biological  
replicates as a column (Biolrep= 1,2,3,4,4,5,6,...,36) in the targets  
file. Please tell me if the information I?m giving is insufficient.

Can anyone see any mistakes in my code, or otherwise give an  
explanation to why I am not able to remove the control probes  
correctly or other reasons for the varying results? I am new to  
Bioconductor and fairly new to microarray analysis, so I might be  
doing a very basic mistake. I would also appreciate comments on my  
treatment of the technical replicates, because this also seems to  
change downstream analysis to some extent (in comparison to just  
leaving out the two technical replicates of the analysis).

RG <-read.maimages(...)

Gvsn <- normalizeBetweenArrays(RG$G,method="vsn")

#Replicate structure
biolrep <- targets$Biolrep

corfit<- duplicateCorrelation(Gvsn,
	design=design,
	ndups=1,
	block=biolrep)

#Pick the probes that are not controls
isGene <- RG$genes$ControlType == 0

fit <- lmFit(Gvsn[isGene,],
	design,
	weights=RG$weights[isGene,],
	ndups=1,
	block=biolrep,
	cor=corfit$consensus)

fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)

And this is the flag function I use before reading the data, which is  
stores in RG$weights is:

FlagFunction <- function(flag) {

##Weight only present spots 1, flagged spots 0.
	present <- x$gIsPosAndSignif == 1
	y <- as.numeric(present & manual)

##Remove flagged spots
	sat <- x$gIsSaturated == 0
	featureOL1 <- x$gIsFeatNonUnifOL == 0
	featureOL2 <- x$gIsFeatPopnOL == 0
	flagged <- (sat & featureOL1 & featureOL2)
	flagged <- grep(FALSE, flagged)
	good <- grep(TRUE, y==1)
	flagged <- intersect(flagged, good)
	y[flagged] <- 0
	y
}

I would be very grateful for any comments, preferably as soon as  
possible since I have quite a time pressure to finish this analysis.

Regards,
Eldrid Borgan