[R] break string at specified possitions

Jan Kacaba jan.kacaba at gmail.com
Tue May 17 16:28:19 CEST 2016


Excellent Hervé, thank you.

2016-05-13 11:48 GMT+02:00 Hervé Pagès <hpages at fredhutch.org>:
> Hi,
>
> Here is the Biostrings solution in case you need to chop a long
> string into hundreds or thousands of fragments (a situation where
> base::substring() is very inefficient):
>
>   library(Biostrings)
>
>   ## Call as.character() on the result if you want it back as
>   ## a character vector.
>   fast_chop_string <- function(x, ends)
>   {
>     if (!is(x, "XString"))
>         x <- as(x, "XString")
>     extractAt(x, at=PartitioningByEnd(ends))
>   }
>
> Will be much faster than substring (e.g. 100x or 1000x) when
> chopping a string like a Human chromosome into hundreds or
> thousands of fragments.
>
> Biostrings is a Bioconductor package:
>
>   https://bioconductor.org/packages/Biostrings
>
> Cheers,
> H.
>
>
>
> On 05/12/2016 01:18 AM, Jan Kacaba wrote:
>>
>> Nice solution Jim, thank you.
>>
>>
>>
>> 2016-05-12 2:45 GMT+02:00 Jim Lemon <drjimlemon at gmail.com>:
>>>
>>> Hi again,
>>> Sorry, that should be:
>>>
>>> chop_string<-function(x,ends) {
>>>   starts<-c(1,ends[-length(ends)]+1)
>>>   return(substring(x,starts,ends))
>>> }
>>>
>>> Jim
>>>
>>> On Thu, May 12, 2016 at 10:05 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>>>>
>>>> Hi Jan,
>>>> This might be helpful:
>>>>
>>>> chop_string<-function(x,ends) {
>>>>   starts<-c(1,ends[-length(ends)]-1)
>>>>   return(substring(x,starts,ends))
>>>> }
>>>>
>>>> Jim
>>>>
>>>>
>>>> On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba <jan.kacaba at gmail.com>
>>>> wrote:
>>>>>
>>>>> Here is my attempt at function which computes margins from positions.
>>>>>
>>>>> require("stringr")
>>>>> require("dplyr")
>>>>>
>>>>> ends<-seq(10,100,8)  # end margins
>>>>> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
>>>>> elit. Aliquam in lorem sit amet leo accumsan lacinia."
>>>>>
>>>>> sekoj=function(ends){
>>>>>    l_ends<-length(ends)
>>>>>    begs=vector(mode="integer",l_ends)
>>>>>    begs[1]=1
>>>>>    for (i in 2:(l_ends)){
>>>>>      begs[i]<-ends[i-1]+1
>>>>>    }
>>>>>    margs<-rbind(begs,ends)
>>>>>    margs<-cbind(margs,c(ends[l_ends]+1,-1))
>>>>>    #rownames(margs)<-c("beg","end")
>>>>>    return(margs)
>>>>> }
>>>>> margins<-sekoj(ends)
>>>>> str_sub(test_string,margins[1,],margins[2,]) %>% print
>>>>>
>>>>> Code to run in browser:
>>>>> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV
>>>>>
>>>>> 2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4567 at gmail.com>:
>>>>>>
>>>>>> Dunno -- but you might have a look at Hadley Wickham's 'stringr'
>>>>>> package:
>>>>>> https://cran.r-project.org/web/packages/stringr/stringr.pdf
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Bert
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming along
>>>>>> and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kacaba at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear R-help
>>>>>>>
>>>>>>> I would like to split long string at specified precomputed positions.
>>>>>>> 'substring' needs beginings and ends. Is there a native function
>>>>>>> which
>>>>>>> accepts positions so I don't have to count second argument?
>>>>>>>
>>>>>>> For example I have vector of possitions pos<-c(5,10,19). Substring
>>>>>>> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
>>>>>>> to write my own function. Just asking.
>>>>>>>
>>>>>>> Derek
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the R-help mailing list