[BioC] Limma and Genepix
lepalmer at notes.cc.sunysb.edu
lepalmer at notes.cc.sunysb.edu
Tue May 17 13:45:43 CEST 2005
This is the pipeline I have been currently using for analysis. I just
wanted peoples opinions on if things can be done better. (Its a 3 sets
of dye-swaps with 2 spots per orf per chip)
library(limma)
targets<-readTargets("targets.txt")
RG<-read.maimages(targets$FileName,source="genepix",wt.fun=wtflags(0))
RG$printer<-getLayout(RG$genes)
RG$genes<-readGAL("Y_pestis.sorted.gal")
spottypes<-readSpotTypes("spotTypes.txt")
RG$genes$Status<-controlStatus(spottypes,RG)
RGb<-backgroundCorrect(RG,method="normexp")
MA<-normalizeWithinArrays(RGb)
MA<-normalizeBetweenArrays(MA)
cor<-duplicateCorrelation(MA,ndups=2,spacing=240)
design<-c(1,-1,1,-1,-1,1)
fit<-lmFit(MA,design,ndups=2,correlation=cor$consensus.correlation,spacing=240)
fit<-eBayes(fit)
tt<-topTable(fit,adjust="fdr",n=6000)
write.table(tt,file="tmp.txt",sep="\t")
I have also recently read about the Kooperberg method for background
correction. Is this a preferred method?
I have been able to do this with the following commands
targets<-readTargets("targets.txt") #
RG<-read.maimages(targets$FileName,source="genepix",wt.fun=wtflags(0))
RG$printer<-getLayout(RG$genes)
RG$genes<-readGAL("Y_pestis.sorted.gal")
spottypes<-readSpotTypes("spotTypes.txt")
RG$genes$Status<-controlStatus(spottypes,RG)
read.series(targets$FileName, suffix=NULL, skip=31, sep="\t")
RGb <- kooperberg(targets$FileName, layout=RG$printer)
RGb$genes<-RG$genes
RGb$printer<-RG$printer
RGb$weights<-RG$weights
RGb$targets<-RG$targets
MA<-normalizeWithinArrays(RGb)
MA<-normalizeBetweenArrays(MA)
cor<-duplicateCorrelation(MA,ndups=2,spacing=240)
design<-c(1,-1,1,-1,-1,1)
fit<-lmFit(MA,design,ndups=2,correlation=cor$consensus.correlation,spacing=240)
fit<-eBayes(fit)
topTable(fit,adjust="fdr",n=32)
tt<-topTable(fit,adjust="fdr",n=6000)
write.table(tt,file="tmp.txt",sep="\t")
I recently had a small argument with an advisor who told me to do
background correction by subtracting background from foreground and
flagging negative numbers. This is obviously the default for limma. BUt
when doing this approach, a lot of spots popped up that didnt make sense
(ie non-specific DNA), while the normexp fixed that problem. I recently
discovered Kooperberg, which was designed for the problem of negative
intensitie with Genepix data. So which is the best method, and how do I
convince this guy that I should use this method?
One last question I have is that these methods will give you some
statistics on gene expression differences. Often people report genes that
are differentially regulated by more than two-fold. It seems to me that
to do this, one would need an intensity cutoff, as genes with little, or
no expression can easily slip into that category. How would one calculate
such a cutoff? There are spots on the array that contain oligos that are
definitely not found in the species being studied. (Bacteria vs
arabidopsis). Can this information be used.
Thanks,
Lance Palmer
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list