[R] Separating a Complicated String Vector
David Winsemius
dwinsemius at comcast.net
Sun Jan 4 08:47:35 CET 2015
On Jan 3, 2015, at 9:20 PM, npretnar wrote:
> Sorry. Bad example on my part. Try this. V1 is ...
>
> V1
> alabama
> bates
> tuscaloosa
> smith
> arkansas
> fayette
> little rock
> alaska
> juneau
> nome
>
> And I want:
>
> V1 V2
> alabama bates
> alabama tuscaloosa
> alabama smith
> arkansas fayette
> arkansas little rock
> alaska juneau
> alaskas nome
dat$is_state <- grepl(tolower(paste(state.name, collapse="|")), dat$V1)
dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) )
dat2 <- data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] ,
V2 = dat$V1[ !dat$is_state] )
> dat2
V1 V2
1 alabama bates
2 alabama tuscaloosa
3 alabama smith
4 arkansas fayette
5 arkansas little
6 arkansas rock
7 alaska juneau
8 alaska nome
--
David.
>
> This is more representative of the problem, extended to all 50 states.
>
> - Nick
>
>
> On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
>
>> I'm not sure what's so complicated about that (am I missing
>> something?). You can search using grep, and replace using gsub, so
>>
>> tmpDF <- read.table(text="V1 V2
>> A 5
>> a1 1
>> a2 1
>> a3 1
>> a4 1
>> a5 1
>> B 4
>> b1 1
>> b2 1
>> b3 1
>> b4 1",
>> header=TRUE)
>> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
>> data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1)))
>>
>> Seems to do the trick.
>>
>> Best,
>> Ista
>>
>> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote:
>>> I have a string variable (V1) in a data frame structured as follows:
>>>
>>> V1 V2
>>> A 5
>>> a1 1
>>> a2 1
>>> a3 1
>>> a4 1
>>> a5 1
>>> B 4
>>> b1 1
>>> b2 1
>>> b3 1
>>> b4 1
>>>
>>> I want the following:
>>>
>>> V1 V2 V3
>>> a1 1 A
>>> a2 1 A
>>> a3 1 A
>>> a4 1 A
>>> a5 1 A
>>> b1 1 B
>>> b2 1 B
>>> b3 1 B
>>> b4 1 B
>>>
>>> I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated.
>>>
>>> Thanks,
>>>
>>> Nicholas Pretnar
>>> Mizzou Economics Grad Assistant
>>> npretnar at gmail.com
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list