[R] Duplicate names in the pivot column

Sun Mar 29 08:25:07 CEST 2020

Hi Phil,
Sorry it's not in the environment you are using, but perhaps this will help:

taby<-table(df$y)
ynames<-names(taby)
for(yval in 1:length(taby)) {
 if(taby[yval] > 1) {
  cat(paste(ynames[yval],1:taby[yval],sep=""),"\n")
  df$y[which(df$y == ynames[yval])]<-paste(ynames[yval],1:taby[yval],sep="")
 }
}

Jim

On Sun, Mar 29, 2020 at 12:19 PM <phil using philipsmith.ca> wrote:
>
> I have a problem involving inefficient coding. My code works, but in my
> actual application it takes a very long time to execute. I have included
> a reprex here that uses the same code, but with a much smaller-scale
> application.
>
> The data frame I am working with (df in my reprex) is in long form and I
> want to change it to wide form. My problem is that the pivot column,
> column 2 in my reprex, has some duplicate strings, so the pivot doesn't
> work well (df1 in my reprex). I want to find all the duplicates and tag
> them so they are no longer duplicates. My code succeeds (df3 in my
> reprex). But in the real application there can be over 100 "cases" and
> the for loops grind on far too long.
>
> I encounter this problem frequently in the datasets I use, so I am
> looking for a general solution that is as efficient as possible. Any
> help will be much appreciated.
>
> Philip
>
> ``` r
> library(tidyverse)
> df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
>                   y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
>                   z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
> df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
> #> Warning: Values in `z` are not uniquely identified; output will
> contain list-cols.
> #> * Use `values_fn = list(z = list)` to suppress this warning.
> #> * Use `values_fn = list(z = length)` to identify where the duplicates
> arise
> #> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
> fixcol <- function(dfm,cases,per,s,tag) {
>    # dfm is the data frame
>    # s is the target column number, containing character names
>    # tag is a string to be added to a duplicate name
>    # cases is the number of rows for a single time period
>    # per is the number of time periods
>    # all time periods must have the same number of rows
>    for (k in 1:per) {
>      for (i in (1+(k-1)*cases):(k*cases-1)) {
>        for (j in (i+1):(k*cases)) {
>          if (dfm[j,s]==dfm[i,s]) { # found a duplicate
>            dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
>            dfm[j,s]
>          }
>        }
>      }
>    }
>    return(dfm)
> }
> df2 <- fixcol(df,6,2,2,"_dup")
> df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
> ```
>
> <sup>Created on 2020-03-28 by the [reprex
> package](https://reprex.tidyverse.org) (v0.3.0)</sup>______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.