[Bioc-devel] BStringSet Documentation

Hervé Pagès hpages at fredhutch.org
Sat Sep 3 01:31:36 CEST 2016


On 09/02/2016 12:25 PM, Hervé Pagès wrote:
> Hi,
>
> On 09/01/2016 12:00 AM, Dario Strbenac wrote:
>> Good day,
>>
>> According to the documentation, I wouldn't think that substr or
>> strsplit would work on a BStringSet, but substr does.
>>
>>> IDs
>>   A BStringSet instance of length 5
>>     width seq
>> [1]    61 D00626:168:C9CWMANXX:1:1105:1816:1998 1:N:0:TCCGGAGA+ATAGAGGC
>> [2]    61 D00626:168:C9CWMANXX:1:1105:2113:1989 1:N:0:TCCGGAGA+ATAGAGGC
>> [3]    61 D00626:168:C9CWMANXX:1:1105:2703:1986 1:N:0:TCCGGAGA+ATAGAGGC
>> [4]    61 D00626:168:C9CWMANXX:1:1105:3255:1979 1:N:0:TCCGGAGA+ATAGAGGC
>> [5]    61 D00626:168:C9CWMANXX:1:1105:4525:1995 1:N:0:TCCGGAGA+ATAGAGGC
>>> substr(IDs, 1, 37)
>> [1] "D00626:168:C9CWMANXX:1:1105:1816:1998"
>> [2] "D00626:168:C9CWMANXX:1:1105:2113:1989"
>> [3] "D00626:168:C9CWMANXX:1:1105:2703:1986"
>> [4] "D00626:168:C9CWMANXX:1:1105:3255:1979"
>> [5] "D00626:168:C9CWMANXX:1:1105:4525:1995"
>>> strsplit(IDs, ' ')
>> Error in strsplit(IDs, " ") : non-character argument
>>
>> I think that both of these functions shouldn't work or both should
>> work, to be consistent.
>
> Why? Because they both have "str" in their name?
>
> It sounds that you are expecting that every string manipulation function
> defined in base R should work on a BStringSet object. Well that's not
> the case and I don't think that's ever going to happen. Some of them
> work and some of them don't. We can add more if needed (e.g. strsplit)
> but there are things like the grep family that BStringSet objects will
> probably never support.
>
> If you need to strsplit() an XStringSet object, you can use this:
>
>   strsplitXStringSet <- function(x, split)
>   {
>       m <- vmatchPattern(split, x)
>       at <- gaps(IRangesList(start=start(m),
>                  end=end(m)), start=1L, end=width(x))
>       extractAt(x, at)
>   }
>
> It's going to behave like strsplit(x, split, fixed=TRUE) except when
> there is a match at the beginning or end of one of the sequences (in
> which case strsplit() has a questionable behavior). Also, unlike
> strsplit(), strsplitXStringSet() doesn't support an empty split
> pattern.

Another difference between strsplit() and strsplitXStringSet() is when
some matches are adjacent or overlapping. This will be explained in the
man page of strsplit,XStringSet-method.

H.

>
> Note that BStringSet objects have supported the reverse operation
> for a while. See ?unstrsplit
>
> I'll add strsplitXStringSet() to Biostrings, as the "strsplit" method
> for XStringSet objects.
>
> H.
>
>>
>> --------------------------------------
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list