[Bioc-devel] cbind SummarizedExperiments containing a DNAStringSet not working

Maarten van Iterson mviterson at gmail.com
Tue Apr 4 08:57:21 CEST 2017


Thanks for looking into this!

Maarten

On Mon, Apr 3, 2017 at 7:00 PM, Hervé Pagès <hpages at fredhutch.org> wrote:

> Hi Maarten,
>
> identical() is not reliable on DNAStringSet objects or other objects
> that contain external pointers as it can return false negatives as well
> as false positives. We'll fix the "cbind" and "rbind" methods for
> SummarizedExperiment to work around this problem.
>
> Thanks for the report.
>
> H.
>
>
> On 04/03/2017 12:58 AM, Maarten van Iterson wrote:
>
>> Dear list,
>>
>> Combining SummarizedExperiment object, containing a DNAStringSet in the
>> rowData seems not to work properly. If I cbind two SummarizedExperiment
>> objects, which I know are identical, an error is reported:
>>
>> Error in FUN(X[[i]], ...) (from #2) :
>>   column(s) 'sourceSeq' in ‘mcols’ are duplicated and the data do not
>> match
>>
>> I think I traced the problem existing in `SummarizedExperiment:::.compa
>> re`
>> in that `identical` is used to compare DNAStringSets which is not behaving
>> as expected. Whereas it should return all identical it returns it is not!
>>
>> Here is a counter example (which was easier to construct) showing that
>> `identical` returns FALSE where it should return TRUE.
>>
>> library(Biostrings)
>>> seq1 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
>>> seq2 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
>>>
>>> seq1
>>>
>> [1] "GACTC"
>>
>>> seq2
>>>
>> [1] "GAATG"
>>
>>>
>>> s1 <- DNAStringSet(seq1)
>>> s2 <- DNAStringSet(seq2)
>>>
>>> str(s1)
>>>
>> Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
>>   ..@ pool           :Formal class 'SharedRaw_Pool' [package "XVector"]
>> with 2 slots
>>   .. .. ..@ xp_list                    :List of 1
>>   .. .. .. ..$ :<externalptr>
>>   .. .. ..@ .link_to_cached_object_list:List of 1
>>   .. .. .. ..$ :<environment:0x71f94d0>
>>   ..@ ranges         :Formal class 'GroupedIRanges' [package "XVector"]
>> with 7 slots
>>   .. .. ..@ group          : int 1
>>   .. .. ..@ start          : int 1
>>   .. .. ..@ width          : int 5
>>   .. .. ..@ NAMES          : NULL
>>   .. .. ..@ elementType    : chr "integer"
>>   .. .. ..@ elementMetadata: NULL
>>   .. .. ..@ metadata       : list()
>>   ..@ elementType    : chr "DNAString"
>>   ..@ elementMetadata: NULL
>>   ..@ metadata       : list()
>>
>>> str(s2)
>>>
>> Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
>>   ..@ pool           :Formal class 'SharedRaw_Pool' [package "XVector"]
>> with 2 slots
>>   .. .. ..@ xp_list                    :List of 1
>>   .. .. .. ..$ :<externalptr>
>>   .. .. ..@ .link_to_cached_object_list:List of 1
>>   .. .. .. ..$ :<environment:0x71f94d0>
>>
>>   ..@ ranges         :Formal class 'GroupedIRanges' [package "XVector"]
>> with 7 slots
>>   .. .. ..@ group          : int 1
>>   .. .. ..@ start          : int 1
>>   .. .. ..@ width          : int 5
>>   .. .. ..@ NAMES          : NULL
>>   .. .. ..@ elementType    : chr "integer"
>>   .. .. ..@ elementMetadata: NULL
>>   .. .. ..@ metadata       : list()
>>   ..@ elementType    : chr "DNAString"
>>   ..@ elementMetadata: NULL
>>   ..@ metadata       : list()
>>
>>>
>>> identical(seq1, seq2)
>>>
>> [1] FALSE
>>
>>> identical(s1, s2)
>>>
>> [1] TRUE
>>
>>> seq1 == seq2
>>>
>> [1] FALSE
>>
>>> s1 == s2
>>>
>> [1] FALSE
>>
>>>
>>> sessionInfo()
>>>
>> R version 3.3.2 (2016-10-31)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.2 LTS
>>
>> locale:
>>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>>  [7] LC_PAPER=en_US.utf8       LC_NAME=C
>>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats4    stats     graphics  grDevices utils     datasets
>> [8] methods   base
>>
>> other attached packages:
>>  [1] Biostrings_2.42.1          XVector_0.14.1
>>  [3] BBMRIomics_1.0.3           SummarizedExperiment_1.4.0
>>  [5] Biobase_2.34.0             GenomicRanges_1.26.4
>>  [7] GenomeInfoDb_1.10.3        IRanges_2.8.2
>>  [9] S4Vectors_0.12.2           BiocGenerics_0.20.0
>>
>> loaded via a namespace (and not attached):
>>  [1] Rcpp_0.12.10             AnnotationDbi_1.36.2
>> hms_0.3
>>  [4] GenomicAlignments_1.10.1 zlibbioc_1.20.0
>> BiocParallel_1.8.1
>>  [7] BSgenome_1.42.0          lattice_0.20-35
>> R6_2.2.0
>> [10] httr_1.2.1               tools_3.3.2
>> grid_3.3.2
>> [13] DBI_0.6                  assertthat_0.1
>> digest_0.6.12
>> [16] tibble_1.2               Matrix_1.2-8
>> readr_1.1.0
>> [19] rtracklayer_1.34.2       bitops_1.0-6
>> biomaRt_2.30.0
>> [22] RCurl_1.95-4.8           memoise_1.0.0
>> RSQLite_1.1-2
>> [25] compiler_3.3.2           GenomicFeatures_1.26.3
>> Rsamtools_1.26.1
>> [28] XML_3.98-1.5             jsonlite_1.3
>> VariantAnnotation_1.20.3
>>
>>>
>>>
>> I don't completely understand understand why `identical` is not working
>> properly is it comparing the environment address in the above example they
>> are the same although the sequences are not? In my case the two
>> SummarizedExperiments contained the same DNAStringSets but had a different
>> environment address?
>>
>> Regards,
>> Maarten
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>> hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt
>> 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=uv
>> rEDLijSOFICTEXtDWEcJQxpbdIH_JLue85P1KkRSk&s=CiJ40v8p658EEANn
>> kQUiSwzWFnU_9gbt3urmC3CXn5g&e=
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list