[BioC] Removing control probes and treating technical replicates in Agilent one colour data
eldridb at stud.ntnu.no
eldridb at stud.ntnu.no
Wed Apr 16 16:22:26 CEST 2008
Dear all,
I am working with single channel Agilent arrays, and need some help in
processing and the data correctly, using Limma. (I have followed the
advice I found in the archives for using Limma for one colour Agilent
data, by using dummy variables for the red channel.)
I started out weighting all the control probes and spots flagged as
outliers by Feature Extraction to zero when loading the data. The
differentially expressed genes I found downstream seemed logical. But
then I understood that the control probes should not be weighted zero,
but should be included in the normalisation procedure, and instead
removed before statistical analysis to increase power. However, when I
tried to subset the expression object in lmFit(), the downstream
result was a very different and shorter genelist. Also, the second
genelist includes some control probes (the first one doesn?t), which
should not be possible, since all the control probes were supposed to
be removed.
I have 38 arrays with 36 biological and 2 technical replicates. I?ve
loaded the .txt files with RG <- read.maimages(). I normalized between
arrays with vsn. I won?t go into the design and contrast matrices,
because I don?t think it?s relevant for my question. I?ve tried to
include the information from the technical replicates by using
duplicateCorrelation() and defined which samples are biological
replicates as a column (Biolrep= 1,2,3,4,4,5,6,...,36) in the targets
file. Please tell me if the information I?m giving is insufficient.
Can anyone see any mistakes in my code, or otherwise give an
explanation to why I am not able to remove the control probes
correctly or other reasons for the varying results? I am new to
Bioconductor and fairly new to microarray analysis, so I might be
doing a very basic mistake. I would also appreciate comments on my
treatment of the technical replicates, because this also seems to
change downstream analysis to some extent (in comparison to just
leaving out the two technical replicates of the analysis).
RG <-read.maimages(...)
Gvsn <- normalizeBetweenArrays(RG$G,method="vsn")
#Replicate structure
biolrep <- targets$Biolrep
corfit<- duplicateCorrelation(Gvsn,
design=design,
ndups=1,
block=biolrep)
#Pick the probes that are not controls
isGene <- RG$genes$ControlType == 0
fit <- lmFit(Gvsn[isGene,],
design,
weights=RG$weights[isGene,],
ndups=1,
block=biolrep,
cor=corfit$consensus)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
And this is the flag function I use before reading the data, which is
stores in RG$weights is:
FlagFunction <- function(flag) {
##Weight only present spots 1, flagged spots 0.
present <- x$gIsPosAndSignif == 1
y <- as.numeric(present & manual)
##Remove flagged spots
sat <- x$gIsSaturated == 0
featureOL1 <- x$gIsFeatNonUnifOL == 0
featureOL2 <- x$gIsFeatPopnOL == 0
flagged <- (sat & featureOL1 & featureOL2)
flagged <- grep(FALSE, flagged)
good <- grep(TRUE, y==1)
flagged <- intersect(flagged, good)
y[flagged] <- 0
y
}
I would be very grateful for any comments, preferably as soon as
possible since I have quite a time pressure to finish this analysis.
Regards,
Eldrid Borgan
More information about the Bioconductor
mailing list