[BioC] ChIP-chip sequence bias not removed

Thu Jul 29 15:33:13 CEST 2010

On Mon, 26 Jul 2010 20:43:01 +0200
 <zacher at lmb.uni-muenchen.de> wrote:
> 
> 
> Dear Edwin,
> 
> as I guess from inspecting your code chunck, you did not substract a
> reference experiment from the IP. To eliminate the sequence-dependent
> bias you either need to substract a reference experiment (like
> Mock-IP or genomic input) or apply a sepcific normalization method
> like MAT, which is designed for this purpose. If you have a reference
> experiment I absolutely recommend to use this instead of MAT, as it
> perfoms a lot better in my experience.
> Best regards,
> 
> Bendikt

Hello Benedikt,
Thanks for your reply, but I am a bit confused about the ChIP-chip data
analysis procedure.
My experience with gene expression Affymetrix is to run RMA on the PM
probe sets with the quantile normalization option. Then use limma to
analyze ratios among experiments.
In the Starr package there is no GCRMA (which would help correct the
base position bias. There is not even an RMA procedure!

My goal is to properly background-subtract and normalize the ChIP-chip
data so that I can obtain log2 enrichment ratio (IP / input). I am
feeling somewhat embarrassed at being stuck in the preprocessing stage.
What do you mean about substracting the input samples? Is it a form of
background subtraction? I plan to use these same input samples to make
my log2 enrichment ratios!

Regards,
Edwin
-- 
> 
> Edwin Groot <edwin.groot at biologie.uni-freiburg.de> schrieb :
> 
> > Hello all,
> > I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to
> compare
> > the localization of different histone modifications in Arabidopsis.
> The
> > goal is to query a genomic region for relative enrichment of the
> > different histone modifications.
> > After trying several normalization methods in Starr, I get good MA
> > plots, densities and histograms, but neither the GC-bias, nor the
> > base-position bias is changed by any normalization method. The
> vignette
> > data, in contrast, shows great improvement in the bias problems.
> Have I
> > missed something? Should I worry about this?
> > I have so far tried loess, vsn, quantile and rankpercentile through
> > Starr.
> > 
> > Thanks,
> > Edwin
> > -- 
> > Here is sample code for one of the normalization methods:
> > > library(Starr)
> > > library(geneplotter)
> > > library(vsn)
> > > AtTile1F <- readBpmap("GPL1979.bpmap")
> > #Only the + strand is represented for all chromosomes
> > > summary(AtTile1F$"At:TIGRv5;chr4"$strand)
> > > cels <- c("h3k27me301.CEL",
> "h3k27me303.CEL",
> > "h3k27me302.CEL",
> > "h3k27me304.CEL", "input01.CEL",
> "input03.CEL",
> > "input02.CEL",
> > "input04.CEL")
> > > names <- c("k27me301", "k27me302",
> > "k27me303", "k27me304", "input01",
> > "input02", "input03", "input04")
> > > type <- c("IP", "IP", "IP",
> > "IP", "INPUT", "INPUT",
> "INPUT",
> > "INPUT")
> > > k27me3 <- readCelFile(AtTile1F, cels, names, type,
> featureData=TRUE,
> > log.it=TRUE)
> > #Normalize
> > > k27me3_loess <- normalize.Probes(k27me3, method =
> "loess")
> > #QC
> > #Try only one pair of IP and control.
> > > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)
> > > controls <-
> c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)
> > > plotMA(k27me3, ip = ips, control = controls)
> > #There is a negative deviation down to -1.5 LFC
> > > plotMA(k27me3_loess, ip = ips, control = controls)
> > #The MA is straight, except for a slight negative bias at highest
> > intensity.
> > > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq,
> > main=paste(sampleNames(k27me3)[1],"GC Bias Before
> Normalization"))
> > #The GC bias increases linearly with base position.
> > > plotGCbias(exprs(k27me3_loess)[, 1],
> featureData(k27me3_loess)$seq,
> > main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess
> > Normalization"))
> > #Same rise (-2 to +2) with base position as Before Normalization.
> > -- 
> > > sessionInfo()
> > R version 2.11.1 (2010-05-31) 
> > x86_64-pc-linux-gnu 
> > 
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
> >  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> > 
> > attached base packages:
> > [1] grid      stats     graphics  grDevices datasets  utils
>     methods
> >  
> > [8] base     
> > 
> > other attached packages:
> >  [1] vsn_3.16.0           geneplotter_1.26.0   annotate_1.26.0     
> >  [4] AnnotationDbi_1.10.1 Starr_1.4.4          affxparser_1.20.0   
> >  [7] affy_1.26.1          Ringo_1.12.0         Matrix_0.999375-40  
> > [10] lattice_0.18-8       limma_3.4.3          RColorBrewer_1.0-2  
> > [13] Biobase_2.8.0       
> > 
> > loaded via a namespace (and not attached):
> >  [1] affyio_1.16.0         DBI_0.2-5             genefilter_1.30.0
>    
> >  [4] MASS_7.3-6            preprocessCore_1.10.0 pspline_1.0-14
>       
> >  [7] RSQLite_0.9-1         splines_2.11.1        survival_2.35-8
>      
> > [10] xtable_1.5-6         
> > 
> > Dr. Edwin Groot, postdoctoral associate
> > AG Laux
> > Institut fuer Biologie III
> > Schaenzlestr. 1
> > 79104 Freiburg, Deutschland
> > +49 761-2032945
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

Dr. Edwin Groot, postdoctoral associate
AG Laux
Institut fuer Biologie III
Schaenzlestr. 1
79104 Freiburg, Deutschland
+49 761-2032945