[BioC] ChIP-chip sequence bias not removed

Raphael Gottardo raphaelgottardo at mac.com
Tue Jul 27 09:23:52 CEST 2010


Bendikt is right, however even in the presence of a control (Input DNA or Mock-IP) a sequence based normalization will help you. 
Please have a look at the supplementary material of the rMAT paper where we show that on two datasets.

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/26/5/678


You might also want to try the rMAT package available from BioC.

Raphael

On 2010-07-26, at 7:43 PM, zacher at lmb.uni-muenchen.de wrote:

> 
> 
> Dear Edwin,
> 
> as I guess from inspecting your code chunck, you did not substract a reference experiment from the IP. To eliminate the sequence-dependent bias you either need to substract a reference experiment (like Mock-IP or genomic input) or apply a sepcific normalization method like MAT, which is designed for this purpose. If you have a reference experiment I absolutely recommend to use this instead of MAT, as it perfoms a lot better in my experience.
> Best regards,
> 
> Bendikt
> 
> Edwin Groot <edwin.groot at biologie.uni-freiburg.de> schrieb :
> 
>> Hello all,
>> I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to compare
>> the localization of different histone modifications in Arabidopsis. The
>> goal is to query a genomic region for relative enrichment of the
>> different histone modifications.
>> After trying several normalization methods in Starr, I get good MA
>> plots, densities and histograms, but neither the GC-bias, nor the
>> base-position bias is changed by any normalization method. The vignette
>> data, in contrast, shows great improvement in the bias problems. Have I
>> missed something? Should I worry about this?
>> I have so far tried loess, vsn, quantile and rankpercentile through
>> Starr.
>> 
>> Thanks,
>> Edwin
>> -- 
>> Here is sample code for one of the normalization methods:
>> > library(Starr)
>> > library(geneplotter)
>> > library(vsn)
>> > AtTile1F <- readBpmap("GPL1979.bpmap")
>> #Only the + strand is represented for all chromosomes
>> > summary(AtTile1F$"At:TIGRv5;chr4"$strand)
>> > cels <- c("h3k27me301.CEL", "h3k27me303.CEL",
>> "h3k27me302.CEL",
>> "h3k27me304.CEL", "input01.CEL", "input03.CEL",
>> "input02.CEL",
>> "input04.CEL")
>> > names <- c("k27me301", "k27me302",
>> "k27me303", "k27me304", "input01",
>> "input02", "input03", "input04")
>> > type <- c("IP", "IP", "IP",
>> "IP", "INPUT", "INPUT", "INPUT",
>> "INPUT")
>> > k27me3 <- readCelFile(AtTile1F, cels, names, type, featureData=TRUE,
>> log.it=TRUE)
>> #Normalize
>> > k27me3_loess <- normalize.Probes(k27me3, method = "loess")
>> #QC
>> #Try only one pair of IP and control.
>> > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)
>> > controls <- c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)
>> > plotMA(k27me3, ip = ips, control = controls)
>> #There is a negative deviation down to -1.5 LFC
>> > plotMA(k27me3_loess, ip = ips, control = controls)
>> #The MA is straight, except for a slight negative bias at highest
>> intensity.
>> > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq,
>> main=paste(sampleNames(k27me3)[1],"GC Bias Before Normalization"))
>> #The GC bias increases linearly with base position.
>> > plotGCbias(exprs(k27me3_loess)[, 1], featureData(k27me3_loess)$seq,
>> main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess
>> Normalization"))
>> #Same rise (-2 to +2) with base position as Before Normalization.
>> -- 
>> > sessionInfo()
>> R version 2.11.1 (2010-05-31) 
>> x86_64-pc-linux-gnu 
>> 
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
>> 
>> attached base packages:
>> [1] grid      stats     graphics  grDevices datasets  utils     methods
>> 
>> [8] base     
>> 
>> other attached packages:
>> [1] vsn_3.16.0           geneplotter_1.26.0   annotate_1.26.0     
>> [4] AnnotationDbi_1.10.1 Starr_1.4.4          affxparser_1.20.0   
>> [7] affy_1.26.1          Ringo_1.12.0         Matrix_0.999375-40  
>> [10] lattice_0.18-8       limma_3.4.3          RColorBrewer_1.0-2  
>> [13] Biobase_2.8.0       
>> 
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0         DBI_0.2-5             genefilter_1.30.0    
>> [4] MASS_7.3-6            preprocessCore_1.10.0 pspline_1.0-14       
>> [7] RSQLite_0.9-1         splines_2.11.1        survival_2.35-8      
>> [10] xtable_1.5-6         
>> 
>> Dr. Edwin Groot, postdoctoral associate
>> AG Laux
>> Institut fuer Biologie III
>> Schaenzlestr. 1
>> 79104 Freiburg, Deutschland
>> +49 761-2032945
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list