[Bioc-devel] cbind SummarizedExperiments containing a DNAStringSet not working
Maarten van Iterson
mviterson at gmail.com
Mon Apr 3 09:58:49 CEST 2017
Dear list,
Combining SummarizedExperiment object, containing a DNAStringSet in the
rowData seems not to work properly. If I cbind two SummarizedExperiment
objects, which I know are identical, an error is reported:
Error in FUN(X[[i]], ...) (from #2) :
column(s) 'sourceSeq' in ‘mcols’ are duplicated and the data do not match
I think I traced the problem existing in `SummarizedExperiment:::.compare`
in that `identical` is used to compare DNAStringSets which is not behaving
as expected. Whereas it should return all identical it returns it is not!
Here is a counter example (which was easier to construct) showing that
`identical` returns FALSE where it should return TRUE.
> library(Biostrings)
> seq1 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
> seq2 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
>
> seq1
[1] "GACTC"
> seq2
[1] "GAATG"
>
> s1 <- DNAStringSet(seq1)
> s2 <- DNAStringSet(seq2)
>
> str(s1)
Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"]
with 2 slots
.. .. ..@ xp_list :List of 1
.. .. .. ..$ :<externalptr>
.. .. ..@ .link_to_cached_object_list:List of 1
.. .. .. ..$ :<environment: 0x71f94d0>
..@ ranges :Formal class 'GroupedIRanges' [package "XVector"]
with 7 slots
.. .. ..@ group : int 1
.. .. ..@ start : int 1
.. .. ..@ width : int 5
.. .. ..@ NAMES : NULL
.. .. ..@ elementType : chr "integer"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ elementType : chr "DNAString"
..@ elementMetadata: NULL
..@ metadata : list()
> str(s2)
Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
..@ pool :Formal class 'SharedRaw_Pool' [package "XVector"]
with 2 slots
.. .. ..@ xp_list :List of 1
.. .. .. ..$ :<externalptr>
.. .. ..@ .link_to_cached_object_list:List of 1
.. .. .. ..$ :<environment: 0x71f94d0>
..@ ranges :Formal class 'GroupedIRanges' [package "XVector"]
with 7 slots
.. .. ..@ group : int 1
.. .. ..@ start : int 1
.. .. ..@ width : int 5
.. .. ..@ NAMES : NULL
.. .. ..@ elementType : chr "integer"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ elementType : chr "DNAString"
..@ elementMetadata: NULL
..@ metadata : list()
>
> identical(seq1, seq2)
[1] FALSE
> identical(s1, s2)
[1] TRUE
> seq1 == seq2
[1] FALSE
> s1 == s2
[1] FALSE
>
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] Biostrings_2.42.1 XVector_0.14.1
[3] BBMRIomics_1.0.3 SummarizedExperiment_1.4.0
[5] Biobase_2.34.0 GenomicRanges_1.26.4
[7] GenomeInfoDb_1.10.3 IRanges_2.8.2
[9] S4Vectors_0.12.2 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 AnnotationDbi_1.36.2
hms_0.3
[4] GenomicAlignments_1.10.1 zlibbioc_1.20.0
BiocParallel_1.8.1
[7] BSgenome_1.42.0 lattice_0.20-35
R6_2.2.0
[10] httr_1.2.1 tools_3.3.2
grid_3.3.2
[13] DBI_0.6 assertthat_0.1
digest_0.6.12
[16] tibble_1.2 Matrix_1.2-8
readr_1.1.0
[19] rtracklayer_1.34.2 bitops_1.0-6
biomaRt_2.30.0
[22] RCurl_1.95-4.8 memoise_1.0.0
RSQLite_1.1-2
[25] compiler_3.3.2 GenomicFeatures_1.26.3
Rsamtools_1.26.1
[28] XML_3.98-1.5 jsonlite_1.3
VariantAnnotation_1.20.3
>
I don't completely understand understand why `identical` is not working
properly is it comparing the environment address in the above example they
are the same although the sequences are not? In my case the two
SummarizedExperiments contained the same DNAStringSets but had a different
environment address?
Regards,
Maarten
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list