[Bioc-devel] GIntervalTree objects are corrupted during save/load

Hervé Pagès hpages at fhcrc.org
Tue Jul 1 18:05:31 CEST 2014


Hi Hector, Michael,

On 07/01/2014 05:57 AM, Michael Lawrence wrote:
> It seems tough to make this work. There is no way for the R serialization
> machinery to understand what needs to be serialized after the external
> pointer. The easiest approach to fixing this would be to reimplement
> everything on top of SEXPs, which is to say, it would not be easy.

This is what I did with PDict objects to store the Aho-Corasick tree.
It's actually easier than it sounds. You can use any atomic type, say
INTSXP or RAWSXP, it doesn't matter, That's just a way to get memory.
Then you do what you want with it (thru casting the pointer to it).
It not only solves the serialization problem, it also automatically
manages the memory, which is now in the hands of the garbage collector.

Cheers,
H.

> Alternatively, we could write our own serializer. It seems R needs a way to
> register (de)serializers for external pointers.
>
>
> On Tue, Jul 1, 2014 at 5:37 AM, Hector Corrada Bravo <hcorrada at gmail.com>
> wrote:
>
>> Confirmed. Will look into it now.
>> Thanks for writing!
>> Hector
>>
>>
>> On Tue, Jul 1, 2014 at 2:40 AM, Kristoffer Vitting-Seerup <
>> kristoffer.vittingseerup at bio.ku.dk> wrote:
>>
>>> Hi bioc-devel
>>>
>>> I’ve fond an error in the usage of GIntervalTree:
>>>
>>>> test <- GRanges(seqnames='Chr1', range=IRanges(start=10,end=20))
>>>> test
>>> GRanges with 1 range and 0 metadata columns:
>>>        seqnames    ranges strand
>>>           <Rle> <IRanges>  <Rle>
>>>    [1]     Chr1  [10, 20]      *
>>>
>>> this object I can save and load without problem:
>>>
>>> save(test, file='test.Rdata')
>>>> rm(test)
>>>> load('test.Rdata')
>>>> test
>>> GRanges with 1 range and 0 metadata columns:
>>>        seqnames    ranges strand
>>>           <Rle> <IRanges>  <Rle>
>>>    [1]     Chr1  [10, 20]      *
>>>
>>>
>>> But if I convert to to a GIntervalTree (for faster overlap finding) I get
>>> a fatal error when loading:
>>>
>>> test2 <- GIntervalTree(test)
>>>> test2
>>> GIntervalTree with 1 range and 0 metadata columns:
>>>        seqnames    ranges strand
>>>           <Rle> <IRanges>  <Rle>
>>>    [1]     Chr1  [10, 20]      *
>>>> save(test2, file='test2.Rdata')
>>>> rm(test2)
>>>> load('test2.Rdata')
>>>> test2
>>> GIntervalTree with 1 range and 0 metadata columns:
>>>
>>>   *** caught segfault ***
>>> address 0xc, cause 'memory not mapped'
>>>
>>> Traceback:
>>>   1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>>   2: .Call2(fun, object at ptr, ..., PACKAGE = "IRanges")
>>>   3: .IntervalForestCall(from, "asIRanges")
>>>   4: asMethod(object)
>>>   5: as(x at ranges, "IRanges")
>>>   6: .GT_reorderValue(x, as(x at ranges, "IRanges"))
>>>   7: .local(x, ...)
>>>   8: ranges(x)
>>>   9: ranges(x)
>>>
>>> Possible actions:
>>> 1: abort (with core dump, if enabled)
>>> 2: normal R exit
>>> 3: exit R without saving workspace
>>> 4: exit R saving workspace
>>>
>>>
>>> My session info:
>>> sessionInfo()
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] grDevices datasets  grid      parallel  stats     graphics  utils
>>> methods   base
>>>
>>> other attached packages:
>>>   [1] spliceR_1.5.0         plyr_1.8.1            RColorBrewer_1.0-5
>>>   VennDiagram_1.6.5     cummeRbund_2.7.1      Gviz_1.9.4
>>>   rtracklayer_1.25.8    GenomicRanges_1.17.14 GenomeInfoDb_1.1.5
>>>   IRanges_1.99.13
>>> [11] S4Vectors_0.0.6       fastcluster_1.1.13    reshape2_1.4
>>>   ggplot2_0.9.3.1       RSQLite_0.11.4        DBI_0.2-7
>>> BiocGenerics_0.11.2
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] AnnotationDbi_1.27.6     BBmisc_1.6               BSgenome_1.33.5
>>>       BatchJobs_1.2            Biobase_2.25.0           BiocParallel_0.7.0
>>>      Biostrings_2.33.8        Formula_1.1-1
>>>   GenomicAlignments_1.1.10
>>> [10] GenomicFeatures_1.17.6   Hmisc_3.14-4             MASS_7.3-33
>>>       R.methodsS3_1.6.1        RCurl_1.95-4.1           Rcpp_0.11.1
>>>       Rsamtools_1.17.14        VariantAnnotation_1.11.5 XML_3.98-1.1
>>> [19] XVector_0.5.6            biomaRt_2.21.0           biovizBase_1.13.7
>>>       bitops_1.0-6             brew_1.0-6               cluster_1.15.2
>>>      codetools_0.2-8          colorspace_1.2-4         dichromat_2.0-0
>>> [28] digest_0.6.4             fail_1.2                 foreach_1.4.2
>>>       gtable_0.1.2             iterators_1.0.7          lattice_0.20-29
>>>       latticeExtra_0.6-26      matrixStats_0.8.14       munsell_0.4.2
>>> [37] proto_0.3-10             scales_0.2.4             sendmailR_1.1-2
>>>       splines_3.1.0            stats4_3.1.0             stringr_0.6.2
>>>       survival_2.37-7          tools_3.1.0              zlibbioc_1.11.1
>>>
>>>
>>>
>>> --
>>> Kindest regards
>>> Kristoffer Vitting-Seerup, cand.scient. (M.Sc.),
>>> Ph.D Fellow
>>> Sandelin Group
>>>
>>> Bioinformatics Centre | Biotech Research & Innovation Centre (BRIC), Dep.
>>> Of Biology
>>> University of Copenhagen
>>> Building 1, 3th floor, office 3 (1-3-03)
>>> Ole Maaløes Vej 5
>>> DK-2200 Copenhagen N
>>> Denmark
>>> http://binf.ku.dk | http://www.bric.ku.dk
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list