[R] Cumulative split of value in data frame column
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Fri Jun 5 20:28:50 CEST 2020
This is a **plain text list **. In future please post in plain text so that
your post does not get mangled.
Anyway,...
I don't know about "efficient, optimized", but here's one simple way to do
it using ?strsplit to unsplit and then ?paste to recombine:
df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))
cumsplit<- function(x,split = "_"){
w <- x[1]
for(i in seq_along(x)[-1]) w <- c(w, paste(w[i-1],x[i], sep = split))
w
}
> lapply(strsplit(df$FOO, split = "_"), cumsplit)
[[1]]
[1] "A" "A_B"
[[2]]
[1] "A" "A_B" "A_B_C"
[[3]]
[1] "A" "A_B" "A_B_C" "A_B_C_D" "A_B_C_D_E"
I wouldn't be surprised if clever use of regex's would be faster, but as I
said, this is simple.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Jun 5, 2020 at 9:33 AM Ravi Jeyaraman <ravi76 using gmail.com> wrote:
> Assuming, I have a data frame like this ..
>
> df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))
>
> I want to do a 'cumulative split' of the values in column FOO based on the
> delimiter '_'. The end result should be like this ..
>
> ID FOO FOO_SPLIT1 FOO_SPLIT2 FOO_SPLIT3
> FOO_SPLIT4 FOO_SPLIT5
> 1 A_B A A_B
> 2 A_B_C A A_B
> A_B_C
> 3 A_B_C_D_E A A_B A_B_C
> A_B_C_D A_B_C_D_E
>
> Any efficient, optimized way to do this?
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list