[BioC] error loading dba in DiffBind

Wed Jun 19 19:58:24 CEST 2013

Gord,

Excellent!  That did the trick. 

Thanks for your help.

Thanks,
Nathan

On Jun 19, 2013, at 12:23 PM, Gordon Brown <Gordon.Brown at cruk.cam.ac.uk>
 wrote:

> Hi,
> 
> Turns out your sample sheet doesn't specify the peak format or peak
> caller.  If you try again with:
> 
>> x = dba(sampleSheet='AVbothChr6.csv',peakFormat='bed')
> 
> you should be able to create the DBA object.  The surprising thing is that
> it worked as-is in R 2.15.  Maybe Rory changed the default peak format...
> not sure.
> 
> Anyway, let me know if you have further trouble.
> 
> Cheers,
> 
> - Gord
> 
> 
> On 2013-06-19 16:51, "Lawson, Nathan" <Nathan.Lawson at umassmed.edu> wrote:
> 
>> 
>> Gord,
>> 
>> Thanks for the quick reply.
>> 
>> Attached is a .zip file with all of the peaksets (they are not so big
>> since they are only from one chromosome) and the sample sheet file.
>> 
>> Below is session info from our cluster as well as from my computer (this
>> is a run from the terminal, but I have also run it with the R64 GUI
>> console with no problem).
>> 
>> 
>> 
>> Session info from HPCC cluster:
>> 
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C                 LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base     
>> 
>> other attached packages:
>> [1] DiffBind_1.6.2       Biobase_2.20.0       GenomicRanges_1.12.4
>> [4] IRanges_1.18.1       BiocGenerics_0.6.0
>> 
>> loaded via a namespace (and not attached):
>> [1] amap_0.8-7         edgeR_3.2.3        gdata_2.12.0.2
>> gplots_2.11.0.1   
>> [5] gtools_2.7.1       limma_3.16.5       RColorBrewer_1.0-5 stats4_3.0.1
>> 
>> [9] zlibbioc_1.6.0
>> 
>> 
>> 
>> 
>> session info from my computer:
>> 
>>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>> 
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> other attached packages:
>> [1] DiffBind_1.4.2       Biobase_2.18.0       GenomicRanges_1.10.7
>> [4] IRanges_1.16.6       BiocGenerics_0.4.0
>> 
>> loaded via a namespace (and not attached):
>> [1] amap_0.8-7         edgeR_3.0.8        gdata_2.12.0
>> gplots_2.11.0     
>> [5] gtools_2.7.1       limma_3.14.4       parallel_2.15.1
>> RColorBrewer_1.0-5
>> [9] stats4_2.15.1      zlibbioc_1.4.0
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> Nathan
>> 
>> On Jun 19, 2013, at 11:24 AM, Gordon Brown <Gordon.Brown at cruk.cam.ac.uk>
>> wrote:
>> 
>>> Hi, Nathan,
>>> 
>>> I haven't seen messages like these.  The warnings suggest that numbers
>>> are
>>> being interpreted as strings, and converted to factors, but I can't
>>> imagine why that would happen.  Can you send along your sample sheet,
>>> and
>>> the first few lines of your peaks files?  I'll see if I can reproduce
>>> it.
>>> Also, can you let me know the sessionInfo() and operating system and
>>> version from both your computer and cluster?
>>> 
>>> In case it helps, "dba.count" now has a "bLowMem" option that greatly
>>> reduces memory usage in dba.count; perhaps you will be able to get
>>> further
>>> on your local machine using that option.  You'll have to upgrade to R
>>> 3.0.1/Bioconductor 2.12 though.
>>> 
>>> Cheers,
>>> 
>>> - Gord
>>> 
>>> On 2013-06-19 15:23, "Lawson, Nathan" <Nathan.Lawson at umassmed.edu>
>>> wrote:
>>> 
>>>> 
>>>> I am using DiffBind to identify differentially occupied elements from
>>>> histone modification ChIP-Seq between two different cell lines.
>>>> 
>>>> I successfully ran the package on my own computer with peaks and mapped
>>>> reads limited to a single human chromosome as a test.  The analysis ran
>>>> nicely and the results looked good.  Unfortunately, I was not able to
>>>> run
>>>> a full genome-wide analysis due to computational and space limitations.
>>>> Therefore, I tried to run the analysis on our high performance
>>>> computing
>>>> cluster, which is when the error appeared.
>>>> 
>>>> When running the SAME EXACT set of files, including the same sample
>>>> sheet, on our cluster, I get the following error output, as well as
>>>> additional warnings that I have not seen previously:
>>>> 
>>>>> AV = dba(sampleSheet="AVbothChr6.csv")
>>>> A1.0 HUAEC K27ac artery  1 raw
>>>> A1.1 HUAEC K27ac artery  2 raw
>>>> V1.0 HUVEC K27ac vein  1 raw
>>>> V1.1 HUVEC K27ac vein  2 raw
>>>> A1.2 HUAEC p300 artery  1 raw
>>>> A1.3 HUAEC p300 artery  2 raw
>>>> V1.3 HUVEC p300 vein  1 raw
>>>> V2.6 HUVEC p300 vein  2 raw
>>>> Error in if (res >= minval) { : missing value where TRUE/FALSE needed
>>>> In addition: Warning messages:
>>>> 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 2: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 3: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 4: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 5: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 6: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 7: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>> 8: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>>> 
>>>> 
>>>> 
>>>> Again, these datasets were successfully entered as a dba object and
>>>> subsequently analyzed using DiffBind on my computer.  The original
>>>> input
>>>> files were simply transferred to the cluster to re-test there.  The
>>>> only
>>>> difference (I can see) is that the cluster is currently running R-3.0.1
>>>> and I was running 2.15.
>>>> 
>>>> I also tried making the peaks.bed files into a 6-column format (the
>>>> original bed files only had 5 columns), but this did not seem to solve
>>>> the problem.
>>>> 
>>>> Any suggestions are welcome.
>>>> 
>>>> Thanks,
>>>> Nathan
>>>> 
>>>> Nathan D. Lawson, Ph.D.
>>>> Associate Professor
>>>> Program in Gene Function and Expression
>>>> University of Massachusetts Medical School
>>>> 364 Plantation Street
>>>> LRB617
>>>> Worcester, MA 01605
>>>> website: lawsonlab.umassmed.edu
>>>> email: nathan.lawson at umassmed.edu
>>>> phone: 508-856-1177
>>>> 
>>> 
>> 
>> 
>> 
>