[R] break string at specified possitions

Hervé Pagès hpages at fredhutch.org
Fri May 13 11:48:21 CEST 2016


Hi,

Here is the Biostrings solution in case you need to chop a long
string into hundreds or thousands of fragments (a situation where
base::substring() is very inefficient):

   library(Biostrings)

   ## Call as.character() on the result if you want it back as
   ## a character vector.
   fast_chop_string <- function(x, ends)
   {
     if (!is(x, "XString"))
         x <- as(x, "XString")
     extractAt(x, at=PartitioningByEnd(ends))
   }

Will be much faster than substring (e.g. 100x or 1000x) when
chopping a string like a Human chromosome into hundreds or
thousands of fragments.

Biostrings is a Bioconductor package:

   https://bioconductor.org/packages/Biostrings

Cheers,
H.


On 05/12/2016 01:18 AM, Jan Kacaba wrote:
> Nice solution Jim, thank you.
>
>
>
> 2016-05-12 2:45 GMT+02:00 Jim Lemon <drjimlemon at gmail.com>:
>> Hi again,
>> Sorry, that should be:
>>
>> chop_string<-function(x,ends) {
>>   starts<-c(1,ends[-length(ends)]+1)
>>   return(substring(x,starts,ends))
>> }
>>
>> Jim
>>
>> On Thu, May 12, 2016 at 10:05 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>>> Hi Jan,
>>> This might be helpful:
>>>
>>> chop_string<-function(x,ends) {
>>>   starts<-c(1,ends[-length(ends)]-1)
>>>   return(substring(x,starts,ends))
>>> }
>>>
>>> Jim
>>>
>>>
>>> On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba <jan.kacaba at gmail.com> wrote:
>>>> Here is my attempt at function which computes margins from positions.
>>>>
>>>> require("stringr")
>>>> require("dplyr")
>>>>
>>>> ends<-seq(10,100,8)  # end margins
>>>> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
>>>> elit. Aliquam in lorem sit amet leo accumsan lacinia."
>>>>
>>>> sekoj=function(ends){
>>>>    l_ends<-length(ends)
>>>>    begs=vector(mode="integer",l_ends)
>>>>    begs[1]=1
>>>>    for (i in 2:(l_ends)){
>>>>      begs[i]<-ends[i-1]+1
>>>>    }
>>>>    margs<-rbind(begs,ends)
>>>>    margs<-cbind(margs,c(ends[l_ends]+1,-1))
>>>>    #rownames(margs)<-c("beg","end")
>>>>    return(margs)
>>>> }
>>>> margins<-sekoj(ends)
>>>> str_sub(test_string,margins[1,],margins[2,]) %>% print
>>>>
>>>> Code to run in browser:
>>>> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV
>>>>
>>>> 2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4567 at gmail.com>:
>>>>> Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
>>>>> https://cran.r-project.org/web/packages/stringr/stringr.pdf
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Bert
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kacaba at gmail.com> wrote:
>>>>>> Dear R-help
>>>>>>
>>>>>> I would like to split long string at specified precomputed positions.
>>>>>> 'substring' needs beginings and ends. Is there a native function which
>>>>>> accepts positions so I don't have to count second argument?
>>>>>>
>>>>>> For example I have vector of possitions pos<-c(5,10,19). Substring
>>>>>> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
>>>>>> to write my own function. Just asking.
>>>>>>
>>>>>> Derek
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-help mailing list