[BioC] DiffBind -error with dba.counts
Anitha Sundararajan
asundara at ncgr.org
Tue Sep 17 19:01:23 CEST 2013
Hi Gordon
Please see below the session info:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
base
other attached packages:
[1] DiffBind_1.6.2 Biobase_2.20.1 GenomicRanges_1.12.5
IRanges_1.18.3 BiocGenerics_0.6.0 BiocInstaller_1.10.3
loaded via a namespace (and not attached):
[1] amap_0.8-7 edgeR_3.2.4 gdata_2.13.2
gplots_2.11.3 gtools_3.0.0 limma_3.16.7 RColorBrewer_1.0-5
stats4_3.0.1
[9] tools_3.0.1 zlibbioc_1.6.0
I have anywhere from 30-55 million reads for my samples. Yes, everything
else on the machine does slow down quite a bit.
I am running R locally now as we do not have R 3.0.1 installed on
command line. Not sure if that matters.
Thanks for all your help.
Anitha
On 9/17/13 3:05 AM, Gordon Brown wrote:
> Hi, Anitha,
>
> What version of Bioconductor/DiffBind are you running, and how much memory
> does your computer have? Older versions of DiffBind use a *lot* of memory
> in the counting stage, so if your computer is short on RAM, it could
> easily run out of memory and start swapping to disk, which will slow it
> down by orders of magnitude. Does everything else on the machine slow
> down as well?
>
> Can you pass along the output from the "sessionInfo()" command?
>
> And if possible, upgrade to the latest version of DiffBind (if you're not
> there already) and try the "bLowMem" option on dba.count.
>
> Other than that, I can't think of any reason it should take hours, unless
> you have *really* big data files. How many reads are in them, roughly?
>
> - Gord
>
>
> On 2013-09-16 21:21, "Anitha Sundararajan" <asundara at ncgr.org> wrote:
>
>> Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the email,
>> my bad)
>>
>>
>> On 9/16/13 1:59 PM, Anitha Sundararajan wrote:
>>> Hi Gordon
>>>
>>> I am now trying to run both reps for each sample, despite their low
>>> correlation. When I try the
>>>
>>>> B73.H3K4=dba.count(B73.H3K4, minOverlap=3)
>>> the R-session just freezes and there is no response for hours. I am
>>> not sure if there is anything wrong with any of my input files. The
>>> sample sheet gets read in fine without any errors.
>>>
>>> Just FYI, my bed file (form MACS2) looks like:
>>>
>>>
>>> chr1 9128 9552 MACS_peak_1 105.25
>>> chr1 9918 10127 MACS_peak_2 4.72
>>> chr1 79482 79691 MACS_peak_3 5.10
>>> chr1 86963 87514 MACS_peak_4 50.23
>>> chr1 94579 94781 MACS_peak_5 5.10
>>> chr1 103763 103997 MACS_peak_6 5.10
>>> chr1 110722 111047 MACS_peak_7 97.69
>>> chr1 144929 145568 MACS_peak_8 127.78
>>> chr1 161344 162320 MACS_peak_9 136.89
>>> chr1 222479 223058 MACS_peak_10 77.67
>>> chr1 227130 227628 MACS_peak_11 17.02
>>> chr1 263835 263971 MACS_peak_12 12.60
>>> chr1 264068 264518 MACS_peak_13 58.01
>>> chr1 264625 265056 MACS_peak_14 68.16
>>> chr1 270509 271086 MACS_peak_15 47.15
>>> chr1 277629 277789 MACS_peak_16 13.25
>>>
>>> Not sure if this is the problem?
>>>
>>> Thanks so much.
>>>
>>> Anitha
>>>
>>> On 9/16/13 3:51 AM, Gordon Brown wrote:
>>>> Hi, Anitha,
>>>>
>>>> The basic problem is that you have two samples, but you're asking for a
>>>> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples). No
>>>> locations can satisfy that criterion, so you end up with an empty set
>>>> of
>>>> peaks.
>>>>
>>>> The message is obscure, I will admit. (It happens because DiffBind
>>>> writes
>>>> out the unified set of peaks and reads it back in, for tedious
>>>> implementation reasons, and when it reads it back in, there are no
>>>> peaks,
>>>> hence "no lines available in input".)
>>>>
>>>> Try using minOverlap=2. But... having said that, I'm not sure how
>>>> useful
>>>> DiffBind will be to you, without replicates.
>>>>
>>>> Cheers,
>>>>
>>>> - Gord Brown
>>>>
>>>>
>>>>
>>>>> Message: 22
>>>>> Date: Fri, 13 Sep 2013 12:21:02 -0600
>>>>> From: Anitha Sundararajan <asundara at ncgr.org>
>>>>> To: bioconductor at r-project.org
>>>>> Subject: [BioC] DiffBind -error with dba.counts
>>>>> Message-ID: <5233578E.3090701 at ncgr.org>
>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>>
>>>>> Hi
>>>>>
>>>>> I have been trying to use DiffBind to analyze our Chip-seq data and
>>>>> have
>>>>> been running into some errors repeatedly.
>>>>>
>>>>> I first created a samplesheet.csv describing my samples and it looks
>>>>> like this:
>>>>>
>>>>>
>>>>> SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks,Pe
>>>>> akC
>>>>>
>>>>> aller
>>>>>
>>>>>
>>>>> meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_meiocyte
>>>>> s.b
>>>>>
>>>>> am,meio.vs.in.rep1.def_peaks.bed,MACS
>>>>>
>>>>>
>>>>> seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seedling.
>>>>> bam
>>>>>
>>>>> ,seed.vs.in.rep1.def_peaks.bed,MACS
>>>>>
>>>>>
>>>>> I only have two samples (and their respective inputs) with one rep
>>>>> each
>>>>> and the peaks were called using MACS v2. The peak caller generated
>>>>> .bed
>>>>> files which was used in DiffBind.
>>>>>
>>>>>
>>>>> I defined the working directory in R first.
>>>>>
>>>>> I then read the sample sheet in :
>>>>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed')
>>>>>> H3K4.B73
>>>>> 2 Samples, 38870 sites in matrix (45304 total):
>>>>> ID Tissue Factor Condition Replicate Peak.caller Intervals
>>>>> 1 meio.1 meiocytes H3K4me3 N 1 MACS 44124
>>>>> 2 seed.1 seedlings H3K4me3 N 1 MACS 41596
>>>>>
>>>>> generated a plot,
>>>>>> plot(H3K4.B73)
>>>>> And then when I tried to perform dba.counts, it continuously fails on
>>>>> me. I went through the thread to find similar posts and could not
>>>>> find
>>>>> a solution. I tried the floowing command:
>>>>>
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3)
>>>>> and this,
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE)
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE)
>>>>> And they all failed.
>>>>>
>>>>> My error in all three cases is as follows:
>>>>> Error in read.table(fn, skip = skipnum) : no lines available in input
>>>>>
>>>>> Please let me know if you have any insights on it.
>>>>>
>>>>> Thanks so much for your help in advance.
>>>>>
>>>>> Anitha Sundararajan Ph.D.
>>>>> Research Scientist
>>>>> National Center for Genome Resources
>>>>> Santa Fe, NM 87505
More information about the Bioconductor
mailing list