[BioC] LIMMA and Agilent single color

Gordon K Smyth smyth at wehi.EDU.AU
Sun Dec 19 02:15:00 CET 2010


Dear Patrick,

Before I answer your questions, can I make a comment on the practice of 
flagging bad spots?  Your code is flagging spots as bad just because they 
are give low intensities.  I don't agree with this practice, because it 
may be a perfectly good observation of a gene that is not expressed in a 
particular condition.  While it is good practice to filter genes that are 
not expressed in any condition in your experiment, this means filtering 
whole rows of your data.  I don't think that the practice of filtering 
individual spots on the basis of intensity is a useful one.

Have said that, I've interpolated answers to your questions below.

> Date: Fri, 17 Dec 2010 17:32:14 +0100
> From: De Boever Patrick <patrick.deboever at vito.be>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] LIMMA and Agilent single color
>
> Dear list members,
>
> My question relates to processing of Agilent single-color arrays. With 
> data generated using Agilent's feature extraction software.

> My script so-for is mentioned below.

> The flag information contained in raw data files is transformed to 
> weights. I have used 1 for 'good' and 0 for 'bad' in myFlagFun.
>
> If this weight information is loaded:

> -where does LIMMA use this information?

Pretty much everywhere.

> Does the lmFit,contrastsFit need additional statements?

No.  The weights are found automatically, provided that you pass the data 
objects to the functions.  Of course, if you extract the expression matrix 
from the data object, then you have to pass the weights explicitly to 
lmFit().

> -what happens if a gene is flagged 'bad' on one array, but not on the 
> other?

Remember that it is spots that are being flagged, not genes.  For gene 
(probe), the good spots are used and the bad spots are not.

> -is there a way to verify that only 'good' genes have been used?

Not directly.  But summary(fit$df.residual) will show the residual degrees 
of freedoms unequal if variable numbers of values have been used.

> Second question,

> My E.avg (MAlist) contains E.avg$genes with gene information, But after 
> applying fit->this gene information is lost? No gene annotation my 
> topTable.

This is because you didn't pass the information about the genes to 
lmFit().  If you use

   fit <- lmFit(E.avg, design)

then it will automatically find both the weights and the gene annotation. 
But if you use,

   fit <- lmFit(E.avg$M, design)

then you're passing only the expression log-ratios to lmFit, and it has 
no way to find the gene annotation.

> Am I missing an argument?

You're not making full use of the data objects.  limma now has better 
features to handle single-channel Agilent arrays.  The following code 
would be more direct:

   targets <- readTargets("targets.txt")
   rawObj <- read.maimages(targets,source="agilent",green.only=TRUE,wt.fun=myFlagFun)
   Obj.corrected <- backgroundCorrect(rawObj, method="normexp", offset=1)
   E <- normalizeBetweenArrays(Obj.corrected, method="quantile")
   E.avg <- avereps(E, ID=E$genes$ProbeName)
   fit <- lmFit(E.avg,design)
   cont.matrix <- makeContrasts(group1vsgroup2=Group1-Group2,levels=design)
   fit2 <- contrasts.fit(fit,cont.matrix)
   fit2 <- eBayes(fit2)
   topTable(fit2)

Note that avereps() doesn't use the weights when making averages.  If you 
average two spots, one with weight 0 and one with weight 1, you'll get the 
average of the two with weight 0.5.

Best wishes
Gordon

> Thank you for providing insight !
>
> Patrick
>
>
> setwd('D:/VITO/R')
> library('Biobase')
> library('convert')
> library('limma')
> AgilentFiles <- list.files(pattern="US")
> myFlagFun <- function(x) {
> #Weight only strongly positive spots 1, everything else 0
> present <- x$gIsPosAndSignif == 1
> probe <- x$ControlType == 0
> manual <- x$IsManualFlag == 0
> strong <- x$gIsWellAboveBG == 1
> y <- as.numeric(present & probe & manual & strong)
>
> #Weight weak spots 0
>
> weak <- strong == FALSE #with values not well above background
> weak <- (present & probe & manual & weak)
> weak <- grep(TRUE,weak)
> y[weak] <- 0
>
> #Weight flagged spots 0
>
> sat <- x$gIsSaturated == 0
> xdr <- x$gIsLowPMTScaledUp == 0
> featureOL1 <- x$gIsFeatNonUnifOL == 0
> featureOL2 <- x$gIsFeatPopnOL == 0
> flagged <- (sat & xdr & featureOL1 & featureOL2)
> flagged <- grep(FALSE, flagged)
> good <- grep(TRUE, y==1)
> flagged <- intersect(flagged, good)
> y[flagged] <- 0
> y
> }
>
> targets <- readTargets("targets.txt")
> rawObj<-read.maimages(AgilentFiles, columns = list(G = "gMeanSignal", Gb = "gBGUsed", R ="gProcessedSignal", Rb = "gBGMedianSignal"),
> annotation= c("Row", "Col", "FeatureNum", "ProbeUID", "ControlType", "ProbeName", "GeneName", "SystematicName"), wt.fun=myFlagFun)
> Obj.corrected <- backgroundCorrect(rawObj, method="normexp", offset=1)
> Obj<-Obj.corrected
> Obj$R <- normalizeBetweenArrays(Obj.corrected$R, method="quantile")
> Obj$R <- log2(Obj$R)
>
> E <- new("MAList", list(targets=Obj$targets, genes=Obj$genes, weights=Obj$weights, source=Obj$source, M=Obj$R, A=Obj$G))
> E.avg <- avereps(E, ID=E$genes$ProbeName)
>
> design<- as.matrix(read.table("targets.txt", row.names="FileName",header=T))
> fit<-lmFit(E.avg$M,design, weights=E.avg$weights)
> cont.matrix <- makeContrasts(group1vsgroup2=Group1-Group2, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> topTable(fit2, genelist=fit2$genes, adjust="BH")
>
>
> Patrick De Boever, PhD, MSc
> Flemish Institute for Technological Research (VITO)
> Unit Environmental Risk and Health, Toxicology group
> Industriezone Vlasmeer 7, 2400 Mol, Belgium
> Tel.   + 32 14 33 51 45
> Fax.  + 32 14 58 05 23
> patrick.deboever at vito.be<mailto:patrick.deboever at vito.be>
>
> Visit our website: www.vito.be/english<http://www.vito.be/english>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list