[R] Separating a Complicated String Vector
William Dunlap
wdunlap at tibco.com
Mon Jan 5 03:21:23 CET 2015
f <- function (x) {
isState <- is.element(tolower(x), tolower(state.name))
w <- which(isState)
data.frame(State = x[rep(w, diff(c(w, length(x) + 1)) - 1L)],
City = x[!isState])
}
E.g.,
V1 <-c("alabama", "bates", "tuscaloosa", "smith", "arkansas", "fayette",
"little rock", "alaska", "juneau", "nome")
> f(V1)
State City
1 alabama bates
2 alabama tuscaloosa
3 alabama smith
4 arkansas fayette
5 arkansas little rock
6 alaska juneau
7 alaska nome
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Sat, Jan 3, 2015 at 9:20 PM, npretnar <npretnar at gmail.com> wrote:
> Sorry. Bad example on my part. Try this. V1 is ...
>
> V1
> alabama
> bates
> tuscaloosa
> smith
> arkansas
> fayette
> little rock
> alaska
> juneau
> nome
>
> And I want:
>
> V1 V2
> alabama bates
> alabama tuscaloosa
> alabama smith
> arkansas fayette
> arkansas little rock
> alaska juneau
> alaskas nome
>
> This is more representative of the problem, extended to all 50 states.
>
> - Nick
>
>
> On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
>
> > I'm not sure what's so complicated about that (am I missing
> > something?). You can search using grep, and replace using gsub, so
> >
> > tmpDF <- read.table(text="V1 V2
> > A 5
> > a1 1
> > a2 1
> > a3 1
> > a4 1
> > a5 1
> > B 4
> > b1 1
> > b2 1
> > b3 1
> > b4 1",
> > header=TRUE)
> > tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
> > data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1)))
> >
> > Seems to do the trick.
> >
> > Best,
> > Ista
> >
> > On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote:
> >> I have a string variable (V1) in a data frame structured as follows:
> >>
> >> V1 V2
> >> A 5
> >> a1 1
> >> a2 1
> >> a3 1
> >> a4 1
> >> a5 1
> >> B 4
> >> b1 1
> >> b2 1
> >> b3 1
> >> b4 1
> >>
> >> I want the following:
> >>
> >> V1 V2 V3
> >> a1 1 A
> >> a2 1 A
> >> a3 1 A
> >> a4 1 A
> >> a5 1 A
> >> b1 1 B
> >> b2 1 B
> >> b3 1 B
> >> b4 1 B
> >>
> >> I am not sure how to go about making this transformation besides
> writing a long vector that contains each of the categorical string names
> (these are state names, so it would be a really long vector). Any help
> would be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Nicholas Pretnar
> >> Mizzou Economics Grad Assistant
> >> npretnar at gmail.com
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list