[BioC] Limma and Genepix

Tue May 17 13:45:43 CEST 2005

This is the pipeline I have been currently using for analysis.  I just 
wanted peoples opinions on if things can be done better.   (Its a 3 sets 
of dye-swaps with 2 spots per orf per chip)

library(limma)
targets<-readTargets("targets.txt") 
RG<-read.maimages(targets$FileName,source="genepix",wt.fun=wtflags(0))
RG$printer<-getLayout(RG$genes)
RG$genes<-readGAL("Y_pestis.sorted.gal")
spottypes<-readSpotTypes("spotTypes.txt")
RG$genes$Status<-controlStatus(spottypes,RG)
RGb<-backgroundCorrect(RG,method="normexp")
MA<-normalizeWithinArrays(RGb)
MA<-normalizeBetweenArrays(MA)
cor<-duplicateCorrelation(MA,ndups=2,spacing=240)
design<-c(1,-1,1,-1,-1,1)
fit<-lmFit(MA,design,ndups=2,correlation=cor$consensus.correlation,spacing=240)
fit<-eBayes(fit)
tt<-topTable(fit,adjust="fdr",n=6000)
write.table(tt,file="tmp.txt",sep="\t")

I have also recently read about the Kooperberg method for background 
correction.  Is this a preferred method?
I have been able to do this with the following commands

targets<-readTargets("targets.txt")  #
RG<-read.maimages(targets$FileName,source="genepix",wt.fun=wtflags(0))
RG$printer<-getLayout(RG$genes)
RG$genes<-readGAL("Y_pestis.sorted.gal")
spottypes<-readSpotTypes("spotTypes.txt")
RG$genes$Status<-controlStatus(spottypes,RG)
read.series(targets$FileName, suffix=NULL, skip=31, sep="\t")
RGb <- kooperberg(targets$FileName, layout=RG$printer)
RGb$genes<-RG$genes
RGb$printer<-RG$printer
RGb$weights<-RG$weights
RGb$targets<-RG$targets
MA<-normalizeWithinArrays(RGb)
MA<-normalizeBetweenArrays(MA)
cor<-duplicateCorrelation(MA,ndups=2,spacing=240)
design<-c(1,-1,1,-1,-1,1)
fit<-lmFit(MA,design,ndups=2,correlation=cor$consensus.correlation,spacing=240)
fit<-eBayes(fit)
topTable(fit,adjust="fdr",n=32)
tt<-topTable(fit,adjust="fdr",n=6000)
write.table(tt,file="tmp.txt",sep="\t")

I recently had a small argument with an advisor who told me to do 
background correction by subtracting background from foreground and 
flagging negative numbers.  This is obviously the default for limma.  BUt 
when doing this approach, a lot of spots popped up that didnt make sense 
(ie non-specific DNA), while the normexp fixed that problem.  I recently 
discovered Kooperberg, which was designed for the problem of negative 
intensitie with Genepix data.  So which is the best method, and how do I 
convince this guy that I should use this method?

One last question I have is that these methods will give you some 
statistics on gene expression differences.  Often people report genes that 
are differentially regulated by more than two-fold.  It seems to me that 
to do this, one would need an intensity cutoff, as genes with little, or 
no expression can easily slip into that category. How would one calculate 
such a cutoff?  There are spots on the array that contain oligos that are 
definitely not found in the species being studied. (Bacteria vs 
arabidopsis).  Can this information be used.

Thanks,
Lance Palmer
	[[alternative HTML version deleted]]