[R] Decompose df1 into another df2 based on values in df1
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Thu May 27 02:28:14 CEST 2021
Thank you for the reprex. However your specification was too vague for me
to know exactly what your data are like, so I tried to assume the most
general possibility, with the consequence that I may be giving you an
answer to the wrong question. Hopefully, you can adjust as needed to get
what you want.
I need also warn you that I am nearly certain there are more elegant,
cleverer, faster ways to do this. I just used simple tools. So you may wish
to wait a bit to see whether others can improve on my attempt.
First of all, I assumed the "a2/a3" in S5 in d1 is a typo and it should be
"a2|a3". If it is is not a typo then substitute "\\||\\/" for "\\|" in the
strsplit function in the code that follows.
Secondly, I assumed that your identifiers, "a1" for example, could occur
more than 1 time in your data. If the only possibilities are 0 or 1 times,
then the code I provided --in particular the last sapply-- is too
complicated. A faster approach in that case might be to use R's outer()
function; I leave that as an exercise for you or someone else to help you
with if so.
Here is my code for your reprex:
getall<- function(x){
ul <-unlist(strsplit(x,"\\|"))
ul[ul != "w"]
}
allvals <- lapply(d1, getall)
uneeks <- sort(unique(unlist(allvals)))
sapply(allvals, function(x)table(factor(x, levels = uneeks)))
## which gives
> sapply(allvals, function(x)table(factor(x, levels = uneeks)))
S1 S2 S3 S4 S5
a1 1 0 0 0 0
a2 1 0 1 0 1
a3 0 0 0 0 1
b1 1 1 1 0 0
b3 1 0 1 0 0
b4 0 0 1 1 0
c1 0 0 1 0 0
c2 0 1 0 0 0
c4 0 0 1 1 0
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, May 26, 2021 at 2:18 PM Adrian Johnson <oriolebaltimore using gmail.com>
wrote:
> Hello,
>
> I am trying to convert a df (given below as d1) into df2 (given below as
> res).
>
> I tried using loops for each row. I cannot get it right. Moreover the df
> is 250000 x 500 in dimension and I cannot get it to work.
>
> Could anyone help me here please.
>
> Thanks.
> Adrian.
>
> d1 <-
> structure(list(S1 = c("a1|a2", "b1|b3", "w"), S2 = c("w", "b1",
> "c2"), S3 = c("a2", "b3|b4|b1", "c1|c4"), S4 = c("w", "b4", "c4"
> ), S5 = c("a2/a3", "w", "w")), class = "data.frame", row.names = c("A",
> "B", "C"))
>
> res <-
> structure(list(S1 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L),
> S2 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L), S3 = c(0L,
> 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L), S4 = c(0L, 0L, 0L, 0L,
> 0L, 0L, 1L, 0L, 0L, 1L), S5 = c(0L, 1L, 1L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L)), class = "data.frame", row.names = c("a1", "a2",
> "a3", "b1", "b2", "b3", "b4", "c1", "c2", "c4"))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list