[BioC] About the DiffBind dba.count() crash problems
Gordon Brown
Gordon.Brown at cruk.cam.ac.uk
Fri Jun 7 11:10:29 CEST 2013
Dear Ken,
Thanks for the bug report. Can you share your bed files with me (maybe
just the first 100 lines or so of each) so I can try to reproduce this
bug? It must be a different one from the previous bed reading bug.
Thanks,
- Gord
On 2013-06-06 23:50, "kentanaka at chiba-u.jp" <kentanaka at chiba-u.jp> wrote:
>Dear Dr. Brown,
>
>I asked the questions earlier regarding the DiffBind last week.
>
>I have versioned up to R3.0.1 and Bioconductor 2.12, and I checked to
>see how DiffBind 1.6.2 works. But the dba.count() still crashes in the
>bed data and I attached the logs of the bed data below.
>
>Since the versioned up DiffBind 1.6.2 output the GEO DATA as an
>Alignment Data Sample, I downloaded the data and I checked to see what
>that is. And I found that the BWA bam files were created.
>
>I converted the fastq data on Th2 immune cells to bam files and I
>checked to see how it works. And finally, I could obtain the dba.report
>().
>
>Thank you very much for all your advice and I really appreciate for your
>kindness.
>
>My Best Regards,
>Ken Tanaka
>
>------------------------------------------------------------------------
>------------------------
>> sessionInfo()
>R version 3.0.1 (2013-05-16)
>Platform: x86_64-suse-linux-gnu (64-bit)
>
>locale:
>[1] C
>
>attached base packages:
>[1] parallel stats graphics grDevices utils datasets methods
>[8] base
>
>other attached packages:
>[1] DiffBind_1.6.2 Biobase_2.20.0 GenomicRanges_1.12.4
>[4] IRanges_1.18.1 BiocGenerics_0.6.0
>
>loaded via a namespace (and not attached):
>[1] RColorBrewer_1.0-5 amap_0.8-7 edgeR_3.2.3 gdata_2.12.
>0.2
>[5] gplots_2.11.0.1 gtools_2.7.1 limma_3.16.5 stats4_3.0.
>1
>[9] zlibbioc_1.6.0
>
>
>> th2 = dba(sampleSheet="th2diffbind.csv")
>GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
>H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
>H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
>
>
>> th2
>3 Samples, 16806 sites in matrix (35676 total):
> ID Tissue Factor Condition Treatment Replicate Peak.caller
>1 GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
>2 H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
>3 H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
> Intervals
>1 464
>2 25875
>3 32569
>
>
>> th2 = dba.count(th2,minOverlap=3, bParallel=F)
>Sample: databed/Th2_GATA3_Ab.bed.gz
>
> *** caught segfault ***
>address (nil), cause 'memory not mapped'
>
>Traceback:
> 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
>insertLength))
> 2: pv.getCounts(job, bed, insertLength, bWithoutDupes = bWithoutDupes,
> bLowMem, yieldSize, mode, singleEnd, scanbamparam)
> 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
>bWithoutDupes = bWithoutDupes, bLowMem, yieldSize, mode, singleEnd,
>scanbamparam))
> 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore
>= score, bLog = bLog, insertLength = insertLength, bOnlyCounts = T,
> bCalledMasks = bCalledMasks, minMaxval = filter, bParallel =
>bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
>bScaleControl = bScaleControl, filterFun = filterFun, bLowMem =
>bLowMem)
> 5: dba.count(th2, minOverlap = 3, bParallel = F)
>
>Possible actions:
>1: abort (with core dump, if enabled)
>2: normal R exit
>3: exit R without saving workspace
>4: exit R saving workspace
>Selection:
>
>------------------------------------------------------------------------
>------------------------
>
># cat th2diffbind.csv
>SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,bamControl,
>ControlID,Peaks,PeakCaller,PeakFormat
>GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.gz,
>databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.xls,macs,
>macs
>H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K4me3.
>bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
>xls,macs,macs
>H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.bed.
>gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.xls,
>macs,macs
>
>
>#zmore Th2_GATA3_Ab.bed.gz |head
>------> Th2_GATA3_Ab.bed.gz <------
>chrY 28285 28404 Th2_GATA3_Ab 1
>chrY 28363 28482 Th2_GATA3_Ab 1
>chrY 28466 28585 Th2_GATA3_Ab 1
>chrY 30469 30588 Th2_GATA3_Ab 1
>chrY 99937 100056 Th2_GATA3_Ab 1
>chrY 114597 114716 Th2_GATA3_Ab 1
>chrY 269098 269217 Th2_GATA3_Ab 1
>chrY 269773 269892 Th2_GATA3_Ab 1
>chrY 410005 410124 Th2_GATA3_Ab 1
>
>
># head -30 GATA3_Ab_peaks.xls
># This file is generated by MACS version 1.4.2 20120305
># ARGUMENTS LIST:
># name = GATA3_Ab
># format = BED
># ChIP-seq file = ../databed/Tst_GATA3_Ab.bed
># control file = ../databed/Tst_WCE.bed
># effective genome size = 1.87e+09
># band width = 300
># model fold = 10,30
># pvalue cutoff = 1.00e-05
># Large dataset will be scaled towards smaller dataset.
># Range for calculating regional lambda is: 1000 bps and 10000 bps
># tag size is determined as 119 bps
># total tags in treatment: 353649
># tags after filtering in treatment: 353649
># maximum duplicate tags at the same position in treatment = 1
># Redundant rate in treatment: 0.00
># total tags in control: 481971
># tags after filtering in control: 481971
># maximum duplicate tags at the same position in control = 1
># Redundant rate in control: 0.00
># d = 200
>chr start end length summit tags -10*log10(pvalue)
>fold_enrichment FDR(%)
>chr1 3704028 3704311 284 142 4 53.27 19.47 13.74
>chr1 3787290 3787781 492 299 4 69.45 68.14 10.89
>chr1 5694647 5695942 1296 189 11 104.10 25.55 8.33
>chr1 5794055 5794296 242 121 4 85.91 38.94 4.17
>chr1 6125574 6126093 520 337 7 56.83 16.22 13.71
>chr1 6444066 6445227 1162 558 19 63.10 10.22 14.29
>
>------------------------------
>------------------------------------------------------------------
>
>----- Original Message -----
>> Hi, Ken,
>>
>> This bug is fixed in the current version of DiffBind, so if you
>upgrade,
>> it should go away. If upgrading is not feasible, ensure that the bed
>> files all have 6 columns to work around the bug.
>>
>> Cheers,
>>
>> - Gord
>>
>>
>> >Message: 23
>> >Date: Sat, 25 May 2013 16:55:00 +0900
>> >From: <kentanaka at chiba-u.jp>
>> >To: <bioconductor at r-project.org>
>> >Subject: [BioC] About the DiffBind dba.count() crash problems
>> >Message-ID: <20130525075500.0000701E.0473 at chiba-u.jp>
>> >Content-Type: text/plain; charset=UTF-8
>> >
>> >Hi, I'm Ken Tanaka.
>> >
>> >I'm currently interested in analyzing the DiffBind analysis by using
>the
>> >ChIP-seq data from Th2 immune cell samples.
>> >
>> >To be more specific, I would like to analyze this data (GSE28292) by
>> >using DiffBind analysis.
>> >
>> >I have questions regarding the dba.count().
>> >When I execute the dba.count(), it crashes.
>> >
>> >The bed data which I'm using doesn't include the 6th strand column.
>> >So, I suppose the crash problem doesn't originate from the problems
>> >regarding the columns.
>> >
>> >I would like to know how to modify the bed data which the DiffBind
>can
>> >read the bed file specifications.
>> >If you can inform me of these DiffBind bed file specifications which
>can
>> >read the bed data, I think I will be able to make the perl script for
>> >conversions.
>> >So, could you kindly please let me know of these DiffBind bed file
>> >specifications which can read the bed data?
>> >
>> >I attached below the data and logs which I used for this analysis as
>> >follows.
>> >
>> >My Best Regards,
>> >Ken Tanaka
>> >
>> >----------------------------------------------------------------
>> ># ChIP-seq bed data files.
>> >GSM773482_Th2_GATA3_Ab.bed.gz
>> >GSM773480_Th2_control_Ab.bed.gz
>> >GSM773484_Th2_WCE.bed.gz (The 2 bed files listed above are the
>> >controls.)
>> >
>> >GSM773486_Th2_WT_anti_H3K27me3.bed.gz
>> >GSM773490_Th2_WT_anti_H3K9Ac.bed.gz
>> >GSM773492_Th2_WT_anti_H3K4me3.bed.gz
>> >GSM773488_Th2_WT_input.bed.gz (The 3 bed files listed above are the
>> >controls.)
>> >
>> >
>> ># macs14 1.4.2 20120305 peak calling output files.
>> >GATA3_Ab_peaks.bed
>> >control_Ab_peaks.bed
>> >
>> >H3K27me3_peaks.bed
>> >H3K4me3_peaks.bed
>> >H3K9Ac_peaks.bed
>> >
>> >
>> ># DiffBind sampleSheet file.
>> >%cat th2diffbind.csv
>> >SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,
>bamControl,
>> >ControlID,Peaks,PeakCaller,PeakFormat
>> >GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.
>gz,
>> >databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.bed,macs,
>raw
>> >control_Ab,control_Ab,Th2,Resistant,Full_Media,1,databed/Th2_control_
>Ab.
>> >bed.gz,databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/control_Ab_peaks.
>bed,
>> >macs,raw
>> >H3K27me3,H3K27me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
>> >H3K27me3.bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/
>> >H3K27me3_peaks.bed,macs,raw
>> >H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
>H3K4me3.
>> >bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
>> >bed,macs,raw
>> >H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.
>bed.
>> >gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.bed,
>> >macs,raw
>> >
>> >
>> >
>> >
>> >> th2 = dba(sampleSheet="th2diffbind.csv")
>> >GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
>> >control_Ab control_Ab Th2 Resistant Full_Media 1 macs
>> >H3K27me3 H3K27me3 Th2 Responsive Full_Media 1 macs
>> >H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
>> >H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
>> >>
>> >> #th2
>> >> #str(th2)
>> >> #plot(th2)
>> >>
>> >> # peaks counting reads
>> >> #th2 = dba.count(th2, bParallel=F)
>> >> th2 = dba.count(th2,minOverlap=3, bParallel=F)
>> >Sample: databed/Th2_GATA3_Ab.bed.gz
>> >
>> > *** caught segfault ***
>> >address 0x10, cause 'memory not mapped'
>> >
>> >Traceback:
>> > 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
>> >insertLength))
>> > 2: pv.getCounts(job, bed, insertLength, bWithoutDupes =
>bWithoutDupes)
>> > 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
>> >bWithoutDupes = bWithoutDupes))
>> > 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap,
>defaultScore
>> >= score, bLog = bLog, insertLength = insertLength, bOnlyCounts =
>T,
>> > bCalledMasks = bCalledMasks, minMaxval = maxFilter, bParallel =
>> >bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
>> >bScaleControl = bScaleControl)
>> > 5: dba.count(th2, minOverlap = 3, bParallel = F)
>> >
>> >Possible actions:
>> >1: abort (with core dump, if enabled)
>> >2: normal R exit
>> >3: exit R without saving workspace
>> >4: exit R saving workspace
>> >Selection: 1
>> >
>> >
>> >
>> >> sessionInfo()
>> >R version 2.15.2 (2012-10-26)
>> >Platform: x86_64-suse-linux-gnu (64-bit)
>> >
>> >locale:
>> > [1] LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=C
>> > [3] LC_TIME=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
>> > [5] LC_MONETARY=ja_JP.UTF-8 LC_MESSAGES=ja_JP.UTF-8
>> > [7] LC_PAPER=C LC_NAME=C
>> > [9] LC_ADDRESS=C LC_TELEPHONE=C
>> >[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
>> >
>> >attached base packages:
>> >[1] stats graphics grDevices utils datasets methods base
>> >
>> >other attached packages:
>> >[1] DiffBind_1.4.2 Biobase_2.18.0 GenomicRanges_1.10.7
>> >[4] IRanges_1.16.6 BiocGenerics_0.4.0
>> >
>> >loaded via a namespace (and not attached):
>> > [1] RColorBrewer_1.0-5 amap_0.8-7 edgeR_3.0.8 gdata_2.
>12.
>> >0
>> > [5] gplots_2.11.0 gtools_2.7.0 limma_3.14.4
>parallel_2.
>> >15.2
>> > [9] stats4_2.15.2 zlibbioc_1.4.0
>> >>
>> >---------------------------------------------------------------------
>---
>> >---------
>> >
>> >--------------------------------------
>> >Ken Tanaka
>> >MD-PhD Candidate
>> >Chiba University Medical School
>> >
>> >
>> >
>> >------------------------------
>> >
>> >_______________________________________________
>> >Bioconductor mailing list
>> >Bioconductor at r-project.org
>> >https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >
>> >
>> >End of Bioconductor Digest, Vol 123, Issue 26
>> >*********************************************
>>
>>
>
>
More information about the Bioconductor
mailing list