[R] Split
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Wed Sep 23 03:47:21 CEST 2020
That was still slower and doesn't quite give what was requested:
> cbind(F1,utils::strcapture("([^_]*)_(.*)", F1$text,
proto=data.frame(Before_=character(), After_=character())))
ID1 ID2 text Before_ After_
1 A1 B1 NONE <NA> <NA>
2 A1 B1 cf_12 cf 12
3 A1 B1 NONE <NA> <NA>
4 A2 B2 X2_25 X2 25
5 A2 B3 fd_15 fd 15
> system.time({
+ cbind(F2,utils::strcapture("([^_]*)_(.*)", F2$text,
proto=data.frame(Before_=character(), After_=character())))
+ }
+ )
user system elapsed
32.712 0.736 33.587
Cheers,
Bert
On Tue, Sep 22, 2020 at 5:45 PM Bill Dunlap <williamwdunlap using gmail.com>
wrote:
> Another way to make columns out of the stuff before and after the
> underscore, with NAs if there is no underscore, is
>
> utils::strcapture("([^_]*)_(.*)", F1$text,
> proto=data.frame(Before_=character(), After_=character()))
>
> -Bill
>
> On Tue, Sep 22, 2020 at 4:25 PM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
>
>> To be clear, I think Rui's solution is perfectly fine and probably better
>> than what I offer below. But just for fun, I wanted to do it without the
>> lapply(). Here is one way. I think my comments suffice to explain.
>>
>> > ## which are the non "_" indices?
>> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
>> > ## paste "_." to these
>> > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
>> > ## Now strsplit() and unlist() them to get a vector
>> > z <- unlist(strsplit(F1$text, "_"))
>> > ## now cbind() to the data frame
>> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
>> > F1
>> ID1 ID2 text 1 2
>> 1 A1 B1 NONE_. NONE .
>> 2 A1 B1 cf_12 cf 12
>> 3 A1 B1 NONE_. NONE .
>> 4 A2 B2 X2_25 X2 25
>> 5 A2 B3 fd_15 fd 15
>> >## You can change the names of the 2 columns yourself
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarradas using sapo.pt>
>> wrote:
>>
>> > Hello,
>> >
>> > A base R solution with strsplit, like in your code.
>> >
>> > F1$Y1 <- +grepl("_", F1$text)
>> >
>> > tmp <- strsplit(as.character(F1$text), "_")
>> > tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
>> > tmp <- do.call(rbind, tmp)
>> > colnames(tmp) <- c("X1", "X2")
>> > F1 <- cbind(F1[-3], tmp) # remove the original column
>> > rm(tmp)
>> >
>> > F1
>> > # ID1 ID2 Y1 X1 X2
>> > #1 A1 B1 0 NONE .
>> > #2 A1 B1 1 cf 12
>> > #3 A1 B1 0 NONE .
>> > #4 A2 B2 1 X2 25
>> > #5 A2 B3 1 fd 15
>> >
>> >
>> > Note that cbind dispatches on F1, an object of class "data.frame".
>> > Therefore it's the method cbind.data.frame that is called and the result
>> > is also a df, though tmp is a "matrix".
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> >
>> > Às 20:07 de 22/09/20, Rui Barradas escreveu:
>> > > Hello,
>> > >
>> > > Something like this?
>> > >
>> > >
>> > > F1$Y1 <- +grepl("_", F1$text)
>> > > F1 <- F1[c(1, 2, 4, 3)]
>> > > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill
>> =
>> > > "right")
>> > > F1
>> > >
>> > >
>> > > Hope this helps,
>> > >
>> > > Rui Barradas
>> > >
>> > > Às 19:55 de 22/09/20, Val escreveu:
>> > >> HI All,
>> > >>
>> > >> I am trying to create new columns based on another column string
>> > >> content. First I want to identify rows that contain a particular
>> > >> string. If it contains, I want to split the string and create two
>> > >> variables.
>> > >>
>> > >> Here is my sample of data.
>> > >> F1<-read.table(text="ID1 ID2 text
>> > >> A1 B1 NONE
>> > >> A1 B1 cf_12
>> > >> A1 B1 NONE
>> > >> A2 B2 X2_25
>> > >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F)
>> > >> If the variable "text" contains this "_" I want to create an
>> indicator
>> > >> variable as shown below
>> > >>
>> > >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>> > >>
>> > >>
>> > >> Then I want to split that string in to two, before "_" and after "_"
>> > >> and create two variables as shown below
>> > >> x1= strsplit(as.character(F1$text),'_',2)
>> > >>
>> > >> My problem is how to combine this with the original data frame. The
>> > >> desired output is shown below,
>> > >>
>> > >>
>> > >> ID1 ID2 Y1 X1 X2
>> > >> A1 B1 0 NONE .
>> > >> A1 B1 1 cf 12
>> > >> A1 B1 0 NONE .
>> > >> A2 B2 1 X2 25
>> > >> A2 B3 1 fd 15
>> > >>
>> > >> Any help?
>> > >> Thank you.
>> > >>
>> > >> ______________________________________________
>> > >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide
>> > >> http://www.R-project.org/posting-guide.html
>> > >> and provide commented, minimal, self-contained, reproducible code.
>> > >>
>> > >
>> > > ______________________________________________
>> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list