[R] Duplicate names in the pivot column

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sun Mar 29 09:24:38 CEST 2020


Does this help?

df4 <- (   df
       %>% group_by( time, y )
       %>% mutate( lvl = seq.int( n() ) )
       %>% ungroup()
       %>% mutate( y = ifelse( 1==lvl
                             , y
                             , paste( y, "dup" )
                             )
                 )
       )

On March 28, 2020 6:18:51 PM PDT, phil using philipsmith.ca wrote:
>I have a problem involving inefficient coding. My code works, but in my
>
>actual application it takes a very long time to execute. I have
>included 
>a reprex here that uses the same code, but with a much smaller-scale 
>application.
>
>The data frame I am working with (df in my reprex) is in long form and
>I 
>want to change it to wide form. My problem is that the pivot column, 
>column 2 in my reprex, has some duplicate strings, so the pivot doesn't
>
>work well (df1 in my reprex). I want to find all the duplicates and tag
>
>them so they are no longer duplicates. My code succeeds (df3 in my 
>reprex). But in the real application there can be over 100 "cases" and 
>the for loops grind on far too long.
>
>I encounter this problem frequently in the datasets I use, so I am 
>looking for a general solution that is as efficient as possible. Any 
>help will be much appreciated.
>
>Philip
>
>``` r
>library(tidyverse)
>df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
>                  y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
>                z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
>df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
>#> Warning: Values in `z` are not uniquely identified; output will 
>contain list-cols.
>#> * Use `values_fn = list(z = list)` to suppress this warning.
>#> * Use `values_fn = list(z = length)` to identify where the
>duplicates 
>arise
>#> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
>fixcol <- function(dfm,cases,per,s,tag) {
>   # dfm is the data frame
>   # s is the target column number, containing character names
>   # tag is a string to be added to a duplicate name
>   # cases is the number of rows for a single time period
>   # per is the number of time periods
>   # all time periods must have the same number of rows
>   for (k in 1:per) {
>     for (i in (1+(k-1)*cases):(k*cases-1)) {
>       for (j in (i+1):(k*cases)) {
>         if (dfm[j,s]==dfm[i,s]) { # found a duplicate
>           dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
>           dfm[j,s]
>         }
>       }
>     }
>   }
>   return(dfm)
>}
>df2 <- fixcol(df,6,2,2,"_dup")
>df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
>```
>
><sup>Created on 2020-03-28 by the [reprex 
>package](https://reprex.tidyverse.org) (v0.3.0)</sup>

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list