[Bioc-sig-seq] behavior of XStringSet after c() step

Hervé Pagès hpages at fhcrc.org
Tue Nov 24 02:10:44 CET 2009


Hi Thomas,

The internals of the XStringSet container have changed in BioC 2.5
in order to support bigger objects (i.e. objects that can have more
than 2^31 letters in them, now this limit is 2^31 letters per element
and the maximum nb of elements is 2^31, very much like for
standard character vectors) and also to support more efficient
combining thru c() or append() (this is now achieved with no copying
of the sequence data). The fact that reverseComplement(), reverse(), 
complement() and chartr() are currently broken on XStringSet objects
that have gone thru combining is because of this change in the 
internals. Most methods that operate on XStringSet objects were adapted
except those 4 methods because of lack of time. I'm working on this
right now and will post again here when it's fixed. Thanks for the
reminder and sorry for the inconvenience.

Cheers,
H.


Thomas Girke wrote:
> Dear List,
> 
> Is there an explanation for the behavior change of XStringSet
> objects that have gone through an append() or c() step and those
> that didn't? I am not observing this problem in the previous 
> R/BioC release.
> 
> Below is a simple example to reproduce this error.
> 
> Thanks in advance for your help.
> 
> Thomas
> 
> ## Example
>> library(Biostrings)
>> dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
>> dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
>> dset3 <- c(dset1, dset2) # using append() doesn't fix the problem
> 
>> reverseComplement(dset3)
> Error in .local(x, ...) : IRanges internal error: length(x) != 1
> 
>> DNAStringSet(dset3, start=1, end=4)
> Error in super(x) : Biostrings internal error: length(x at pool) != 1
> 
> ## The problem goes away by doing the following
>> dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))
> 
>> reverseComplement(dset3fix)
>   A DNAStringSet instance of length 6
>     width seq
> [1]     9 GTAATATGC
> [2]     9 GGATCGATT
> [3]     9 GTAATATGC
> [4]    11 GTAATATGCGG
> [5]    11 GGATCGATTTT
> [6]    11 GTATTATATGC
> 
> 
>> DNAStringSet(dset3fix, start=1, end=4)
>   A DNAStringSet instance of length 6
>     width seq
> [1]     4 GCAT
> [2]     4 AATC
> [3]     4 GCAT
> [4]     4 CCGC
> [5]     4 AAAA
> [6]     4 GCAT
> 
> 
>> sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Biostrings_2.14.1 IRanges_1.4.3
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list