[BioC] About the DiffBind dba.count() crash problems

kentanaka at chiba-u.jp kentanaka at chiba-u.jp
Fri Jun 7 00:50:17 CEST 2013


Dear Dr. Brown, 

I asked the questions earlier regarding the DiffBind last week. 

I have versioned up to R3.0.1 and Bioconductor 2.12, and I checked to 
see how DiffBind 1.6.2 works. But the dba.count() still crashes in the 
bed data and I attached the logs of the bed data below.

Since the versioned up DiffBind 1.6.2 output the GEO DATA as an 
Alignment Data Sample, I downloaded the data and I checked to see what 
that is. And I found that the BWA bam files were created. 

I converted the fastq data on Th2 immune cells to bam files and I 
checked to see how it works. And finally, I could obtain the dba.report
(). 

Thank you very much for all your advice and I really appreciate for your 
kindness. 

My Best Regards, 
Ken Tanaka

------------------------------------------------------------------------
------------------------
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] DiffBind_1.6.2       Biobase_2.20.0       GenomicRanges_1.12.4
[4] IRanges_1.18.1       BiocGenerics_0.6.0  

loaded via a namespace (and not attached):
[1] RColorBrewer_1.0-5 amap_0.8-7         edgeR_3.2.3        gdata_2.12.
0.2    
[5] gplots_2.11.0.1    gtools_2.7.1       limma_3.16.5       stats4_3.0.
1      
[9] zlibbioc_1.6.0    


> th2 = dba(sampleSheet="th2diffbind.csv")
GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs


> th2
3 Samples, 16806 sites in matrix (35676 total):
        ID   Tissue Factor  Condition  Treatment Replicate Peak.caller
1 GATA3_Ab GATA3_Ab    Th2  Resistant Full_Media         1        macs
2  H3K4me3  H3K4me3    Th2 Responsive Full_Media         1        macs
3   H3K9Ac   H3K9Ac    Th2 Responsive Full_Media         1        macs
  Intervals
1       464
2     25875
3     32569


> th2 = dba.count(th2,minOverlap=3, bParallel=F)
Sample: databed/Th2_GATA3_Ab.bed.gz

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
insertLength))
 2: pv.getCounts(job, bed, insertLength, bWithoutDupes = bWithoutDupes,  
   bLowMem, yieldSize, mode, singleEnd, scanbamparam)
 3: pv.listadd(results, pv.getCounts(job, bed, insertLength, 
bWithoutDupes = bWithoutDupes,     bLowMem, yieldSize, mode, singleEnd, 
scanbamparam))
 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore 
= score,     bLog = bLog, insertLength = insertLength, bOnlyCounts = T,  
   bCalledMasks = bCalledMasks, minMaxval = filter, bParallel = 
bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, 
bScaleControl = bScaleControl,     filterFun = filterFun, bLowMem = 
bLowMem)
 5: dba.count(th2, minOverlap = 3, bParallel = F)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

------------------------------------------------------------------------
------------------------

# cat th2diffbind.csv
SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,bamControl,
ControlID,Peaks,PeakCaller,PeakFormat
GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.gz,
databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.xls,macs,
macs
H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K4me3.
bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
xls,macs,macs
H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.bed.
gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.xls,
macs,macs


#zmore Th2_GATA3_Ab.bed.gz |head
------> Th2_GATA3_Ab.bed.gz <------
chrY    28285   28404   Th2_GATA3_Ab    1
chrY    28363   28482   Th2_GATA3_Ab    1
chrY    28466   28585   Th2_GATA3_Ab    1
chrY    30469   30588   Th2_GATA3_Ab    1
chrY    99937   100056  Th2_GATA3_Ab    1
chrY    114597  114716  Th2_GATA3_Ab    1
chrY    269098  269217  Th2_GATA3_Ab    1
chrY    269773  269892  Th2_GATA3_Ab    1
chrY    410005  410124  Th2_GATA3_Ab    1


# head -30 GATA3_Ab_peaks.xls
# This file is generated by MACS version 1.4.2 20120305
# ARGUMENTS LIST:
# name = GATA3_Ab
# format = BED
# ChIP-seq file = ../databed/Tst_GATA3_Ab.bed
# control file = ../databed/Tst_WCE.bed
# effective genome size = 1.87e+09
# band width = 300
# model fold = 10,30
# pvalue cutoff = 1.00e-05
# Large dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# tag size is determined as 119 bps
# total tags in treatment: 353649
# tags after filtering in treatment: 353649
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.00
# total tags in control: 481971
# tags after filtering in control: 481971
# maximum duplicate tags at the same position in control = 1
# Redundant rate in control: 0.00
# d = 200
chr     start   end     length  summit  tags    -10*log10(pvalue)       
fold_enrichment FDR(%)
chr1    3704028 3704311 284     142     4       53.27   19.47   13.74
chr1    3787290 3787781 492     299     4       69.45   68.14   10.89
chr1    5694647 5695942 1296    189     11      104.10  25.55   8.33
chr1    5794055 5794296 242     121     4       85.91   38.94   4.17
chr1    6125574 6126093 520     337     7       56.83   16.22   13.71
chr1    6444066 6445227 1162    558     19      63.10   10.22   14.29

------------------------------
------------------------------------------------------------------

----- Original Message -----
> Hi, Ken,
> 
> This bug is fixed in the current version of DiffBind, so if you 
upgrade,
> it should go away.  If upgrading is not feasible, ensure that the bed
> files all have 6 columns to work around the bug.
> 
> Cheers,
> 
>  - Gord
> 
> 
> >Message: 23
> >Date: Sat, 25 May 2013 16:55:00 +0900
> >From: <kentanaka at chiba-u.jp>
> >To: <bioconductor at r-project.org>
> >Subject: [BioC] About the DiffBind dba.count() crash problems
> >Message-ID: <20130525075500.0000701E.0473 at chiba-u.jp>
> >Content-Type: text/plain; charset=UTF-8
> >
> >Hi, I'm Ken Tanaka.
> >
> >I'm currently interested in analyzing the DiffBind analysis by using 
the
> >ChIP-seq data from Th2 immune cell samples.
> >
> >To be more specific, I would like to analyze this data (GSE28292) by
> >using DiffBind analysis.
> >
> >I have questions regarding the dba.count().
> >When I execute the dba.count(), it crashes.
> >
> >The bed data which I'm using doesn't include the 6th strand column.
> >So, I suppose the crash problem doesn't originate from the problems
> >regarding the columns.
> >
> >I would like to know how to modify the bed data which the DiffBind 
can
> >read the bed file specifications.
> >If you can inform me of these DiffBind bed file specifications which 
can
> >read the bed data, I think I will be able to make the perl script for
> >conversions. 
> >So, could you kindly please let me know of these DiffBind bed file
> >specifications which can read the bed data?
> >
> >I attached below the data and logs which I used for this analysis as
> >follows. 
> >
> >My Best Regards, 
> >Ken Tanaka
> >
> >----------------------------------------------------------------
> ># ChIP-seq bed data files.
> >GSM773482_Th2_GATA3_Ab.bed.gz
> >GSM773480_Th2_control_Ab.bed.gz
> >GSM773484_Th2_WCE.bed.gz     (The 2 bed files listed above are the
> >controls.)
> >
> >GSM773486_Th2_WT_anti_H3K27me3.bed.gz
> >GSM773490_Th2_WT_anti_H3K9Ac.bed.gz
> >GSM773492_Th2_WT_anti_H3K4me3.bed.gz
> >GSM773488_Th2_WT_input.bed.gz (The 3 bed files listed above are the
> >controls.)
> >
> >
> ># macs14 1.4.2 20120305 peak calling output files.
> >GATA3_Ab_peaks.bed
> >control_Ab_peaks.bed
> >
> >H3K27me3_peaks.bed
> >H3K4me3_peaks.bed
> >H3K9Ac_peaks.bed
> >
> >
> ># DiffBind sampleSheet file.
> >%cat th2diffbind.csv
> >SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,
bamControl,
> >ControlID,Peaks,PeakCaller,PeakFormat
> >GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.
gz,
> >databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.bed,macs,
raw
> >control_Ab,control_Ab,Th2,Resistant,Full_Media,1,databed/Th2_control_
Ab.
> >bed.gz,databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/control_Ab_peaks.
bed,
> >macs,raw
> >H3K27me3,H3K27me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
> >H3K27me3.bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/
> >H3K27me3_peaks.bed,macs,raw
> >H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
H3K4me3.
> >bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
> >bed,macs,raw
> >H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.
bed.
> >gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.bed,
> >macs,raw
> >
> >
> >
> >
> >> th2 = dba(sampleSheet="th2diffbind.csv")
> >GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
> >control_Ab control_Ab Th2 Resistant Full_Media 1 macs
> >H3K27me3 H3K27me3 Th2 Responsive Full_Media 1 macs
> >H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
> >H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
> >> 
> >> #th2
> >> #str(th2)
> >> #plot(th2)
> >> 
> >> # peaks counting reads
> >> #th2 = dba.count(th2, bParallel=F)
> >> th2 = dba.count(th2,minOverlap=3, bParallel=F)
> >Sample: databed/Th2_GATA3_Ab.bed.gz
> >
> > *** caught segfault ***
> >address 0x10, cause 'memory not mapped'
> >
> >Traceback:
> > 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
> >insertLength))
> > 2: pv.getCounts(job, bed, insertLength, bWithoutDupes = 
bWithoutDupes)
> > 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
> >bWithoutDupes = bWithoutDupes))
> > 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, 
defaultScore
> >= score,     bLog = bLog, insertLength = insertLength, bOnlyCounts = 
T,
> >   bCalledMasks = bCalledMasks, minMaxval = maxFilter, bParallel =
> >bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
> >bScaleControl = bScaleControl)
> > 5: dba.count(th2, minOverlap = 3, bParallel = F)
> >
> >Possible actions:
> >1: abort (with core dump, if enabled)
> >2: normal R exit
> >3: exit R without saving workspace
> >4: exit R saving workspace
> >Selection: 1
> >
> >
> >
> >> sessionInfo()
> >R version 2.15.2 (2012-10-26)
> >Platform: x86_64-suse-linux-gnu (64-bit)
> >
> >locale:
> > [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8
> > [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=ja_JP.UTF-8
> > [7] LC_PAPER=C                 LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >other attached packages:
> >[1] DiffBind_1.4.2       Biobase_2.18.0       GenomicRanges_1.10.7
> >[4] IRanges_1.16.6       BiocGenerics_0.4.0
> >
> >loaded via a namespace (and not attached):
> > [1] RColorBrewer_1.0-5 amap_0.8-7         edgeR_3.0.8        gdata_2.
12.
> >0      
> > [5] gplots_2.11.0      gtools_2.7.0       limma_3.14.4       
parallel_2.
> >15.2   
> > [9] stats4_2.15.2      zlibbioc_1.4.0
> >> 
> >---------------------------------------------------------------------
---
> >---------
> >
> >--------------------------------------
> >Ken Tanaka
> >MD-PhD Candidate
> >Chiba University Medical School
> >
> >
> >
> >------------------------------
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> >End of Bioconductor Digest, Vol 123, Issue 26
> >*********************************************
> 
>



More information about the Bioconductor mailing list