[R] Getting many substrings but only loading the original string one time.
murdoch.duncan at gmail.com
Mon Apr 11 22:14:03 CEST 2011
On 11/04/2011 3:48 PM, Jonathan wrote:
> Hi All,
> I'm looking for a way to get many substrings from a longer string and
> then stitch them together. But, since the longer string is really, really
> long (like 250 MB long), I don't want to do this in a loop and load and
> re-load the longer string many times. Does anybody have an idea?
> Maybe I could pass in two vectors (the first would have the starting
> coordinates, and the second would have the stopping coordinates), so it
> would be like a vectorized version of substr, where start and stop would be
> vector instead of single integers.
> Example (I'm reducing the size of the string for the example) of how this
> might work:
> > longerString<- 'HelloThisIsMyLongerString"
> > startVector<- c(2,6,4)
> > stopVector<- c(4,10,5)
> > substrings<- vectorized_substr(longerString, startVector, stop Vector)
> > longerString
>  "ell" "ThisI" "lo"
Use substring(), not substr(). It is vectorized:
> substring(longerString, startVector, stopVector)
 "ell" "ThisI" "lo"
It does this by replicating the longerString, but that doesn't mean
actual copies are made: just multiple pointers to the same big one.
> Then I'd like to concatenate them (there will be many of them)
> > result<- paste(longerString,collapse='')
> > result
>  "ellThisIlo"
> (perhaps the paste command as I've done it is the best way, but depending on
> how the substrings are reported there may be different ways). Thanks!
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help