[Bioc-sig-seq] behavior of XStringSet after c() step

Thomas Girke thomas.girke at ucr.edu
Wed Nov 25 18:34:24 CET 2009


Hi Hervé,

Thanks a lot for fixing this so quickly. 

Thomas

On Tue, Nov 24, 2009 at 11:13:50PM -0800, Hervé Pagès wrote:
> Hi Thomas,
> 
> This is fixed in release (Biostrings 2.14.8 / IRanges 1.4.8) and
> devel (Biostrings 2.15.9 / IRanges 1.5.10).
> In addition to the methods you reported below, I found a few more
> methods that were still not supporting XStringSet objects with
> a pool of length > 1 (compact() + the coercion methods from an
> XStringSet subtype (B/DNA/RNA/AA) to another subtype).
> 
> The new versions of Biostrings / IRanges should become available
> thru biocLite() in the next 24 hours.
> 
> Cheers,
> H.
> 
> 
> Thomas Girke wrote:
> >Hi Hervé,
> >
> >Thanks for the clarification. Right now this is just a slight 
> >inconvenience,
> >whereas the support for larger object sizes is a very welcome major 
> >improvement.
> >
> >Thanks for doing this.
> >
> >Thomas
> >
> >
> >On Mon, Nov 23, 2009 at 05:10:44PM -0800, Hervé Pagès wrote:
> >>Hi Thomas,
> >>
> >>The internals of the XStringSet container have changed in BioC 2.5
> >>in order to support bigger objects (i.e. objects that can have more
> >>than 2^31 letters in them, now this limit is 2^31 letters per element
> >>and the maximum nb of elements is 2^31, very much like for
> >>standard character vectors) and also to support more efficient
> >>combining thru c() or append() (this is now achieved with no copying
> >>of the sequence data). The fact that reverseComplement(), reverse(), 
> >>complement() and chartr() are currently broken on XStringSet objects
> >>that have gone thru combining is because of this change in the 
> >>internals. Most methods that operate on XStringSet objects were adapted
> >>except those 4 methods because of lack of time. I'm working on this
> >>right now and will post again here when it's fixed. Thanks for the
> >>reminder and sorry for the inconvenience.
> >>
> >>Cheers,
> >>H.
> >>
> >>
> >>Thomas Girke wrote:
> >>>Dear List,
> >>>
> >>>Is there an explanation for the behavior change of XStringSet
> >>>objects that have gone through an append() or c() step and those
> >>>that didn't? I am not observing this problem in the previous 
> >>>R/BioC release.
> >>>
> >>>Below is a simple example to reproduce this error.
> >>>
> >>>Thanks in advance for your help.
> >>>
> >>>Thomas
> >>>
> >>>## Example
> >>>>library(Biostrings)
> >>>>dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
> >>>>dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
> >>>>dset3 <- c(dset1, dset2) # using append() doesn't fix the problem
> >>>>reverseComplement(dset3)
> >>>Error in .local(x, ...) : IRanges internal error: length(x) != 1
> >>>
> >>>>DNAStringSet(dset3, start=1, end=4)
> >>>Error in super(x) : Biostrings internal error: length(x at pool) != 1
> >>>
> >>>## The problem goes away by doing the following
> >>>>dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))
> >>>>reverseComplement(dset3fix)
> >>> A DNAStringSet instance of length 6
> >>>   width seq
> >>>[1]     9 GTAATATGC
> >>>[2]     9 GGATCGATT
> >>>[3]     9 GTAATATGC
> >>>[4]    11 GTAATATGCGG
> >>>[5]    11 GGATCGATTTT
> >>>[6]    11 GTATTATATGC
> >>>
> >>>
> >>>>DNAStringSet(dset3fix, start=1, end=4)
> >>> A DNAStringSet instance of length 6
> >>>   width seq
> >>>[1]     4 GCAT
> >>>[2]     4 AATC
> >>>[3]     4 GCAT
> >>>[4]     4 CCGC
> >>>[5]     4 AAAA
> >>>[6]     4 GCAT
> >>>
> >>>
> >>>>sessionInfo()
> >>>R version 2.10.0 (2009-10-26)
> >>>x86_64-unknown-linux-gnu
> >>>
> >>>locale:
> >>>[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
> >>>LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C      
> >>>LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
> >>>[9] LC_ADDRESS=C               LC_TELEPHONE=C             
> >>>LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>>attached base packages:
> >>>[1] stats     graphics  grDevices utils     datasets  methods   base
> >>>
> >>>other attached packages:
> >>>[1] Biostrings_2.14.1 IRanges_1.4.3
> >>>
> >>>loaded via a namespace (and not attached):
> >>>[1] Biobase_2.6.0
> >>>
> >>>_______________________________________________
> >>>Bioc-sig-sequencing mailing list
> >>>Bioc-sig-sequencing at r-project.org
> >>>https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >>-- 
> >>Hervé Pagès
> >>
> >>Program in Computational Biology
> >>Division of Public Health Sciences
> >>Fred Hutchinson Cancer Research Center
> >>1100 Fairview Ave. N, M2-B876
> >>P.O. Box 19024
> >>Seattle, WA 98109-1024
> >>
> >>E-mail: hpages at fhcrc.org
> >>Phone:  (206) 667-5791
> >>Fax:    (206) 667-1319
> >>
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>



More information about the Bioc-sig-sequencing mailing list