[R] Split
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Wed Sep 23 01:25:13 CEST 2020
To be clear, I think Rui's solution is perfectly fine and probably better
than what I offer below. But just for fun, I wanted to do it without the
lapply(). Here is one way. I think my comments suffice to explain.
> ## which are the non "_" indices?
> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> ## paste "_." to these
> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
> ## Now strsplit() and unlist() them to get a vector
> z <- unlist(strsplit(F1$text, "_"))
> ## now cbind() to the data frame
> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> F1
ID1 ID2 text 1 2
1 A1 B1 NONE_. NONE .
2 A1 B1 cf_12 cf 12
3 A1 B1 NONE_. NONE .
4 A2 B2 X2_25 X2 25
5 A2 B3 fd_15 fd 15
>## You can change the names of the 2 columns yourself
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> Hello,
>
> A base R solution with strsplit, like in your code.
>
> F1$Y1 <- +grepl("_", F1$text)
>
> tmp <- strsplit(as.character(F1$text), "_")
> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
> tmp <- do.call(rbind, tmp)
> colnames(tmp) <- c("X1", "X2")
> F1 <- cbind(F1[-3], tmp) # remove the original column
> rm(tmp)
>
> F1
> # ID1 ID2 Y1 X1 X2
> #1 A1 B1 0 NONE .
> #2 A1 B1 1 cf 12
> #3 A1 B1 0 NONE .
> #4 A2 B2 1 X2 25
> #5 A2 B3 1 fd 15
>
>
> Note that cbind dispatches on F1, an object of class "data.frame".
> Therefore it's the method cbind.data.frame that is called and the result
> is also a df, though tmp is a "matrix".
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 20:07 de 22/09/20, Rui Barradas escreveu:
> > Hello,
> >
> > Something like this?
> >
> >
> > F1$Y1 <- +grepl("_", F1$text)
> > F1 <- F1[c(1, 2, 4, 3)]
> > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill =
> > "right")
> > F1
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 19:55 de 22/09/20, Val escreveu:
> >> HI All,
> >>
> >> I am trying to create new columns based on another column string
> >> content. First I want to identify rows that contain a particular
> >> string. If it contains, I want to split the string and create two
> >> variables.
> >>
> >> Here is my sample of data.
> >> F1<-read.table(text="ID1 ID2 text
> >> A1 B1 NONE
> >> A1 B1 cf_12
> >> A1 B1 NONE
> >> A2 B2 X2_25
> >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F)
> >> If the variable "text" contains this "_" I want to create an indicator
> >> variable as shown below
> >>
> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
> >>
> >>
> >> Then I want to split that string in to two, before "_" and after "_"
> >> and create two variables as shown below
> >> x1= strsplit(as.character(F1$text),'_',2)
> >>
> >> My problem is how to combine this with the original data frame. The
> >> desired output is shown below,
> >>
> >>
> >> ID1 ID2 Y1 X1 X2
> >> A1 B1 0 NONE .
> >> A1 B1 1 cf 12
> >> A1 B1 0 NONE .
> >> A2 B2 1 X2 25
> >> A2 B3 1 fd 15
> >>
> >> Any help?
> >> Thank you.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list