[BioC] LIMMA and Agilent single color
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Dec 19 02:15:00 CET 2010
Dear Patrick,
Before I answer your questions, can I make a comment on the practice of
flagging bad spots? Your code is flagging spots as bad just because they
are give low intensities. I don't agree with this practice, because it
may be a perfectly good observation of a gene that is not expressed in a
particular condition. While it is good practice to filter genes that are
not expressed in any condition in your experiment, this means filtering
whole rows of your data. I don't think that the practice of filtering
individual spots on the basis of intensity is a useful one.
Have said that, I've interpolated answers to your questions below.
> Date: Fri, 17 Dec 2010 17:32:14 +0100
> From: De Boever Patrick <patrick.deboever at vito.be>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] LIMMA and Agilent single color
>
> Dear list members,
>
> My question relates to processing of Agilent single-color arrays. With
> data generated using Agilent's feature extraction software.
> My script so-for is mentioned below.
> The flag information contained in raw data files is transformed to
> weights. I have used 1 for 'good' and 0 for 'bad' in myFlagFun.
>
> If this weight information is loaded:
> -where does LIMMA use this information?
Pretty much everywhere.
> Does the lmFit,contrastsFit need additional statements?
No. The weights are found automatically, provided that you pass the data
objects to the functions. Of course, if you extract the expression matrix
from the data object, then you have to pass the weights explicitly to
lmFit().
> -what happens if a gene is flagged 'bad' on one array, but not on the
> other?
Remember that it is spots that are being flagged, not genes. For gene
(probe), the good spots are used and the bad spots are not.
> -is there a way to verify that only 'good' genes have been used?
Not directly. But summary(fit$df.residual) will show the residual degrees
of freedoms unequal if variable numbers of values have been used.
> Second question,
> My E.avg (MAlist) contains E.avg$genes with gene information, But after
> applying fit->this gene information is lost? No gene annotation my
> topTable.
This is because you didn't pass the information about the genes to
lmFit(). If you use
fit <- lmFit(E.avg, design)
then it will automatically find both the weights and the gene annotation.
But if you use,
fit <- lmFit(E.avg$M, design)
then you're passing only the expression log-ratios to lmFit, and it has
no way to find the gene annotation.
> Am I missing an argument?
You're not making full use of the data objects. limma now has better
features to handle single-channel Agilent arrays. The following code
would be more direct:
targets <- readTargets("targets.txt")
rawObj <- read.maimages(targets,source="agilent",green.only=TRUE,wt.fun=myFlagFun)
Obj.corrected <- backgroundCorrect(rawObj, method="normexp", offset=1)
E <- normalizeBetweenArrays(Obj.corrected, method="quantile")
E.avg <- avereps(E, ID=E$genes$ProbeName)
fit <- lmFit(E.avg,design)
cont.matrix <- makeContrasts(group1vsgroup2=Group1-Group2,levels=design)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
topTable(fit2)
Note that avereps() doesn't use the weights when making averages. If you
average two spots, one with weight 0 and one with weight 1, you'll get the
average of the two with weight 0.5.
Best wishes
Gordon
> Thank you for providing insight !
>
> Patrick
>
>
> setwd('D:/VITO/R')
> library('Biobase')
> library('convert')
> library('limma')
> AgilentFiles <- list.files(pattern="US")
> myFlagFun <- function(x) {
> #Weight only strongly positive spots 1, everything else 0
> present <- x$gIsPosAndSignif == 1
> probe <- x$ControlType == 0
> manual <- x$IsManualFlag == 0
> strong <- x$gIsWellAboveBG == 1
> y <- as.numeric(present & probe & manual & strong)
>
> #Weight weak spots 0
>
> weak <- strong == FALSE #with values not well above background
> weak <- (present & probe & manual & weak)
> weak <- grep(TRUE,weak)
> y[weak] <- 0
>
> #Weight flagged spots 0
>
> sat <- x$gIsSaturated == 0
> xdr <- x$gIsLowPMTScaledUp == 0
> featureOL1 <- x$gIsFeatNonUnifOL == 0
> featureOL2 <- x$gIsFeatPopnOL == 0
> flagged <- (sat & xdr & featureOL1 & featureOL2)
> flagged <- grep(FALSE, flagged)
> good <- grep(TRUE, y==1)
> flagged <- intersect(flagged, good)
> y[flagged] <- 0
> y
> }
>
> targets <- readTargets("targets.txt")
> rawObj<-read.maimages(AgilentFiles, columns = list(G = "gMeanSignal", Gb = "gBGUsed", R ="gProcessedSignal", Rb = "gBGMedianSignal"),
> annotation= c("Row", "Col", "FeatureNum", "ProbeUID", "ControlType", "ProbeName", "GeneName", "SystematicName"), wt.fun=myFlagFun)
> Obj.corrected <- backgroundCorrect(rawObj, method="normexp", offset=1)
> Obj<-Obj.corrected
> Obj$R <- normalizeBetweenArrays(Obj.corrected$R, method="quantile")
> Obj$R <- log2(Obj$R)
>
> E <- new("MAList", list(targets=Obj$targets, genes=Obj$genes, weights=Obj$weights, source=Obj$source, M=Obj$R, A=Obj$G))
> E.avg <- avereps(E, ID=E$genes$ProbeName)
>
> design<- as.matrix(read.table("targets.txt", row.names="FileName",header=T))
> fit<-lmFit(E.avg$M,design, weights=E.avg$weights)
> cont.matrix <- makeContrasts(group1vsgroup2=Group1-Group2, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> topTable(fit2, genelist=fit2$genes, adjust="BH")
>
>
> Patrick De Boever, PhD, MSc
> Flemish Institute for Technological Research (VITO)
> Unit Environmental Risk and Health, Toxicology group
> Industriezone Vlasmeer 7, 2400 Mol, Belgium
> Tel. + 32 14 33 51 45
> Fax. + 32 14 58 05 23
> patrick.deboever at vito.be<mailto:patrick.deboever at vito.be>
>
> Visit our website: www.vito.be/english<http://www.vito.be/english>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list