[Bioc-devel] BStringSet Documentation

Hervé Pagès hpages at fredhutch.org
Fri Sep 2 21:25:58 CEST 2016


Hi,

On 09/01/2016 12:00 AM, Dario Strbenac wrote:
> Good day,
>
> According to the documentation, I wouldn't think that substr or strsplit would work on a BStringSet, but substr does.
>
>> IDs
>   A BStringSet instance of length 5
>     width seq
> [1]    61 D00626:168:C9CWMANXX:1:1105:1816:1998 1:N:0:TCCGGAGA+ATAGAGGC
> [2]    61 D00626:168:C9CWMANXX:1:1105:2113:1989 1:N:0:TCCGGAGA+ATAGAGGC
> [3]    61 D00626:168:C9CWMANXX:1:1105:2703:1986 1:N:0:TCCGGAGA+ATAGAGGC
> [4]    61 D00626:168:C9CWMANXX:1:1105:3255:1979 1:N:0:TCCGGAGA+ATAGAGGC
> [5]    61 D00626:168:C9CWMANXX:1:1105:4525:1995 1:N:0:TCCGGAGA+ATAGAGGC
>> substr(IDs, 1, 37)
> [1] "D00626:168:C9CWMANXX:1:1105:1816:1998"
> [2] "D00626:168:C9CWMANXX:1:1105:2113:1989"
> [3] "D00626:168:C9CWMANXX:1:1105:2703:1986"
> [4] "D00626:168:C9CWMANXX:1:1105:3255:1979"
> [5] "D00626:168:C9CWMANXX:1:1105:4525:1995"
>> strsplit(IDs, ' ')
> Error in strsplit(IDs, " ") : non-character argument
>
> I think that both of these functions shouldn't work or both should work, to be consistent.

Why? Because they both have "str" in their name?

It sounds that you are expecting that every string manipulation function
defined in base R should work on a BStringSet object. Well that's not
the case and I don't think that's ever going to happen. Some of them
work and some of them don't. We can add more if needed (e.g. strsplit)
but there are things like the grep family that BStringSet objects will
probably never support.

If you need to strsplit() an XStringSet object, you can use this:

   strsplitXStringSet <- function(x, split)
   {
       m <- vmatchPattern(split, x)
       at <- gaps(IRangesList(start=start(m),
                  end=end(m)), start=1L, end=width(x))
       extractAt(x, at)
   }

It's going to behave like strsplit(x, split, fixed=TRUE) except when
there is a match at the beginning or end of one of the sequences (in
which case strsplit() has a questionable behavior). Also, unlike
strsplit(), strsplitXStringSet() doesn't support an empty split
pattern.

Note that BStringSet objects have supported the reverse operation
for a while. See ?unstrsplit

I'll add strsplitXStringSet() to Biostrings, as the "strsplit" method
for XStringSet objects.

H.

>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list