[R] splitting a vector of strings...
Jonathan Greenberg
greenberg at ucdavis.edu
Fri Oct 23 06:50:41 CEST 2009
William et al:
Thanks! I think I have a somewhat more complicated issue due to the
type of string I'm using -- the split is " | " (space pipe space) -- how
do I code that based on your sub code below? Using " | *" doesn't seem
to be working. Thanks!
--j
William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Greenberg
>> Sent: Thursday, October 22, 2009 7:35 PM
>> To: r-help
>> Subject: [R] splitting a vector of strings...
>>
>> Quick question -- if I have a vector of strings that I'd like
>> to split
>> into two new vectors based on a substring that is inside of
>> each string,
>> what is the most efficient way to do this? The substring
>> that I want to
>> split on is multiple characters, if that matters, and it is
>> contained in
>> every element of the character vector.
>>
>
> strsplit and sub can both be used for this. If you know
> the string will be split into 2 parts then 2 calls to sub
> with slightly different patterns will do it. strsplit requires
> less fiddling with the pattern and is handier when the number
> of parts is variable or large. strsplit's output often needs to
> be rearranged for convenient use.
>
> E.g., I made 100,000 strings with a 'qaz' in their middles with
> x<-paste("X",sample(1e5),sep="")
> y<-sub("X","Y",x)
> xy<-paste(x,y,sep="qaz")
> and split them by the 'qaz' in two ways:
> system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
> # user system elapsed
> # 0.22 0.00 0.21
>
> system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
> 1)),y=unlist(lapply(tmp,`[`,2)))})
> user system elapsed
> # 2.42 0.00 2.20
> identical(ret1,ret2)
> #[1] TRUE
> identical(ret1$x,x) && identical(ret1$y,y)
> #[1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> --j
>>
>> --
>>
>> Jonathan A. Greenberg, PhD
>> Postdoctoral Scholar
>> Center for Spatial Technologies and Remote Sensing (CSTARS)
>> University of California, Davis
>> One Shields Avenue
>> The Barn, Room 250N
>> Davis, CA 95616
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
--
Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
More information about the R-help
mailing list