[BioC] DiffBind - overlap between different peak callers
Paolo Kunderfranco
paolo.kunderfranco at gmail.com
Tue Aug 7 12:27:16 CEST 2012
Dear All,
I am using package DiffBind,
Due to the fact that I miss real biological or technical replicates, I
am assesing the overlapping rates between different peak callers.
Here is how I set up my database to read in:
3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions
> rm(list=ls())
> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/")
> test=dba(sampleSheet="database.csv")
> test
24 Samples, 19380 sites in matrix (40809 total):
ID Tissue Factor Condition Replicate Intervals
1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834
2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450
3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990
4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188
5 CMp_H3K27me3_s CMN H3K27 CMp_H3K27me3 2 3388
6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946
7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243
8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476
9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632
10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917
11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646
12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985
13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110
14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484
15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938
16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569
17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808
18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747
19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318
20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210
21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988
22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164
23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429
24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063
> test=dba.count(test)
> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2)
#minMemebers was se to 2 due to
the fact I have 2 replicates for each condition
#....78 Contrasts:
> test = dba.analyze(test)
#for each condition generate a report
>test.DB1=dba.report(test,contrast=1)
I would like now to assess overlapping rates between replicates, and
generate a venn diagram.
but it seems like replicates are merged together and I cannot treat
them as single samples.
I noticed this behaviour also when i try to generate both a PCA plot
and a heatmap plot, replicates are merged, so
what's the utility to display both in the PCA or heatmap plot?
would it be simple to display the original replicates to assess how do
they cluster?
>pdf('PCA_plot_DBA_CONDITION.pdf')
>dba.plotPCA(test,attributes=DBA_CONDITION, vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhite3","antiquewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4","brown","coral"))
>dev.off()
>pdf('HeatMap.pdf')
>corvals=dba.plotHeatmap(test)
>dev.off()
>pdf('HeatMap_RPKM_FOLD.pdf')
>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD)
>dev.off()
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9
[4] IRanges_1.14.4 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 gplots_2.11.0
[5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 stats4_2.15.1
[9] tools_2.15.1 zlibbioc_1.2.0
Any sugestion is appreciated,
cheers,
paolo
More information about the Bioconductor
mailing list