[R] Duplicate names in the pivot column
phii m@iii@g oii phiiipsmith@c@
phii m@iii@g oii phiiipsmith@c@
Sun Mar 29 17:45:40 CEST 2020
Thank you very much, Jim and Jeff. Both of your solutions work
splendidly.
Philip
On 2020-03-29 02:25, Jim Lemon wrote:
> Hi Phil,
> Sorry it's not in the environment you are using, but perhaps this will
> help:
>
> taby<-table(df$y)
> ynames<-names(taby)
> for(yval in 1:length(taby)) {
> if(taby[yval] > 1) {
> cat(paste(ynames[yval],1:taby[yval],sep=""),"\n")
> df$y[which(df$y ==
> ynames[yval])]<-paste(ynames[yval],1:taby[yval],sep="")
> }
> }
>
> Jim
>
> On Sun, Mar 29, 2020 at 12:19 PM <phil using philipsmith.ca> wrote:
>>
>> I have a problem involving inefficient coding. My code works, but in
>> my
>> actual application it takes a very long time to execute. I have
>> included
>> a reprex here that uses the same code, but with a much smaller-scale
>> application.
>>
>> The data frame I am working with (df in my reprex) is in long form and
>> I
>> want to change it to wide form. My problem is that the pivot column,
>> column 2 in my reprex, has some duplicate strings, so the pivot
>> doesn't
>> work well (df1 in my reprex). I want to find all the duplicates and
>> tag
>> them so they are no longer duplicates. My code succeeds (df3 in my
>> reprex). But in the real application there can be over 100 "cases" and
>> the for loops grind on far too long.
>>
>> I encounter this problem frequently in the datasets I use, so I am
>> looking for a general solution that is as efficient as possible. Any
>> help will be much appreciated.
>>
>> Philip
>>
>> ``` r
>> library(tidyverse)
>> df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2),
>>
>> y=c("A","B","C","B","D","C","A","B","C","B","D","C"),
>>
>> z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE)
>> df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z)
>> #> Warning: Values in `z` are not uniquely identified; output will
>> contain list-cols.
>> #> * Use `values_fn = list(z = list)` to suppress this warning.
>> #> * Use `values_fn = list(z = length)` to identify where the
>> duplicates
>> arise
>> #> * Use `values_fn = list(z = summary_fun)` to summarise duplicates
>> fixcol <- function(dfm,cases,per,s,tag) {
>> # dfm is the data frame
>> # s is the target column number, containing character names
>> # tag is a string to be added to a duplicate name
>> # cases is the number of rows for a single time period
>> # per is the number of time periods
>> # all time periods must have the same number of rows
>> for (k in 1:per) {
>> for (i in (1+(k-1)*cases):(k*cases-1)) {
>> for (j in (i+1):(k*cases)) {
>> if (dfm[j,s]==dfm[i,s]) { # found a duplicate
>> dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate
>> dfm[j,s]
>> }
>> }
>> }
>> }
>> return(dfm)
>> }
>> df2 <- fixcol(df,6,2,2,"_dup")
>> df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z)
>> ```
>>
>> <sup>Created on 2020-03-28 by the [reprex
>> package](https://reprex.tidyverse.org)
>> (v0.3.0)</sup>______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list