[R] concatenating columns in data.frame

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Fri Jul 2 07:03:14 CEST 2021


I use parts of the tidyverse frequently, but this post is the best argument I can imagine for learning base R techniques.

On July 1, 2021 8:41:06 PM PDT, Avi Gross via R-help <r-help using r-project.org> wrote:
>Micha,
>
>Others have provided ways in standard R so I will contribute a somewhat
>odd solution using the dplyr and related packages in the tidyverse
>including a sample data.frame/tibble I made. It requires newer versions
>of R and other  packages as it uses some fairly esoteric features
>including "the big bang" and the new ":=" operator and more.
>
>You can use your own data with whatever columns you need, of course.
>
>The goal is to have umpteen columns in the data that you want to add an
>additional columns to an existing tibble that is the result of
>concatenating the rowwise contents of a dynamically supplied vector of
>column names in quotes. First we need something to work with so here is
>a sample:
>
>#--start
># load required packages, or a bunch at once!
>library(tidyverse)
>
># Pick how many rows you want. For a demo, 3 is plenty N <- 3
>
># Make a sample tibble with N rows and the following 4 columns mydf <-
>tibble(alpha = 1:N, 
>               beta=letters[1:N],
>               gamma = N:1,
>               delta = month.abb[1:N])
>
># show the original tibble
>print(mydf)
>#--end
>
>In flat text mode, here is the output:
>
>> print(mydf)
># A tibble: 3 x 4
>alpha beta  gamma delta
><int> <chr> <int> <chr>
>  1     1 a         3 Jan  
>2     2 b         2 Feb  
>3     3 c         1 Mar
>
>Now I want to make a function that is used instead of the mutate verb.
>I made a weird one-liner that is a tad hard to explain so first let me
>mention the requirements.
>
>It will take a first argument that is a tibble and in a pipeline this
>would be passed invisibly.
>The second required argument is a vector or list containing the names
>of the columns as strings. A column can be re-used multiple times.
>The third optional argument is what to name the new column with a
>default if omitted.
>The fourth optional argument allows you to choose a different separator
>than "" if you wish.
>
>The function should be usable in a pipeline on both sides so it should
>also return the input tibble with an extra column to the output.
>
>Here is the function:
>
>my_mutate <- function(df, columns, colnew="concatenated", sep=""){
>  df %>%
>    mutate( "{colnew}" := paste(!!!rlang::syms(columns), sep = sep )) }
>
>Yes, the above can be done inline as a long one-liner:
>
>my_mutate <- function(df, columns, colnew="concatenated", sep="")
>mutate(df, "{colnew}" := paste(!!!rlang::syms(columns), sep = sep ))
>
>Here are examples of it running:
>
>
>> choices <- c("beta", "delta", "alpha", "delta") mydf %>% 
>> my_mutate(choices, "me2")
># A tibble: 3 x 5
>alpha beta  gamma delta me2     
><int> <chr> <int> <chr> <chr>   
>  1     1 a         3 Jan   aJan1Jan
>2     2 b         2 Feb   bFeb2Feb
>3     3 c         1 Mar   cMar3Mar
>> mydf %>% my_mutate(choices, "me2",":")
># A tibble: 3 x 5
>alpha beta  gamma delta me2        
><int> <chr> <int> <chr> <chr>      
>  1     1 a         3 Jan   a:Jan:1:Jan
>2     2 b         2 Feb   b:Feb:2:Feb
>3     3 c         1 Mar   c:Mar:3:Mar
>> mydf %>% my_mutate(c("beta", "beta", "gamma", "gamma", "delta", 
>> "alpha"))
># A tibble: 3 x 5
>alpha beta  gamma delta concatenated
><int> <chr> <int> <chr> <chr>       
>  1     1 a         3 Jan   aa33Jan1    
>2     2 b         2 Feb   bb22Feb2    
>3     3 c         1 Mar   cc11Mar3    
>> mydf %>% my_mutate(list("beta", "beta", "gamma", "gamma", "delta", 
>> "alpha"))
># A tibble: 3 x 5
>alpha beta  gamma delta concatenated
><int> <chr> <int> <chr> <chr>       
>  1     1 a         3 Jan   aa33Jan1    
>2     2 b         2 Feb   bb22Feb2    
>3     3 c         1 Mar   cc11Mar3    
>> mydf %>% my_mutate(columns=list("alpha", "beta", "gamma", "delta", 
>> "gamma", "beta", "alpha"),
>                     +                    sep="/*/",
>                     +                    colnew="NewRandomNAME"
>                     +                    )
># A tibble: 3 x 5
>alpha beta  gamma delta NewRandomNAME              
><int> <chr> <int> <chr> <chr>                      
>  1     1 a         3 Jan   1/*/a/*/3/*/Jan/*/3/*/a/*/1
>2     2 b         2 Feb   2/*/b/*/2/*/Feb/*/2/*/b/*/2
>3     3 c         1 Mar   3/*/c/*/1/*/Mar/*/1/*/c/*/3
>
>Does this meet your normal need? Just to show it works in a pipeline,
>here is a variant:
>
>mydf %>%
>  tail(2) %>%
>  my_mutate(c("beta", "beta"), "betabeta") %>%
>  print() %>%
>  my_mutate(list("alpha", "betabeta", "gamma"),
>            "buildson", 
>            "&")
>
>The above only keeps the last two lines of the tibble, makes a double
>copy of "beta" under a new name, prints the intermediate result,
>continues to make another concatenation using the variable created
>earlier then prints the result:
>
>Here is the run:
>
>> mydf %>%
>  +   tail(2) %>%
>  +   my_mutate(c("beta", "beta"), "betabeta") %>%
>  +   print() %>%
>  +   my_mutate(list("alpha", "betabeta", "gamma"),
>                +             "buildson", 
>                +             "&")
># A tibble: 2 x 5
>alpha beta  gamma delta betabeta
><int> <chr> <int> <chr> <chr>   
>  1     2 b         2 Feb   bb      
>2     3 c         1 Mar   cc      
># A tibble: 2 x 6
>alpha beta  gamma delta betabeta buildson
><int> <chr> <int> <chr> <chr>    <chr>   
>  1     2 b         2 Feb   bb       2&bb&2  
>2     3 c         1 Mar   cc       3&cc&1  
>
>As to how the darn function works, that was a learning experience for
>me to build using features I have not had occasion to use. If anyone
>remains interested, read on. 
>
>The following needs newish features:
>
>	"{colnew}" := SOMETHING
>
>The colon-equals operator in newer R/dplyr can be sort of used in an
>odd way that allows the name of the variable to be in quotes and in
>brackets akin to the way glue() does it. The variable colnew is
>evaluated and substituted so the name used for the column is now
>dynamic.
>
>The function does a paste using this:
>
>	!!!rlang::syms(columns)
>
>The problem is paste() wants multiple arguments and we have a single
>argument that is either a vector or another kind of vector called a
>list. The trick is to convert the vector into symbols then use "!!!" to
>convert something like 'c("alpha", "beta", "gamma")' into something
>more like ' "alpha", "beta", "gamma" ' so that paste sees them as
>multiple arguments to concatenate in vector fashion.
>
>And, the function is not polished but I am sure you can all see some of
>what is needed like checking the arguments for validity, including not
>having a name for the new column that clashes with existing column
>names, doing something sane if no columns to concatenate are offered
>and so on.
>
>Just showing a different approach. The base R methods are fine.
>
>- Avi
>
>-----Original Message-----
>From: R-help <r-help-bounces using r-project.org> On Behalf Of Micha Silver
>Sent: Thursday, July 1, 2021 10:36 AM
>To: R-help using r-project.org
>Subject: [R] concatenating columns in data.frame
>
>I need to create a new data.frame column as a concatenation of existing
>character columns. But the number and name of the columns to
>concatenate needs to be passed in dynamically. The code below does what
>I want, but seems very clumsy. Any suggestions how to improve?
>
>
>df = data.frame("A"=sample(letters, 10), "B"=sample(letters, 10),
>"C"=sample(letters,10), "D"=sample(letters, 10))
>
># Which columns to concat:
>
>use_columns = c("D", "B")
>
>
>UpdateCombo = function(df, use_columns) {
>     use_df = df[, use_columns]
>     combo_list = lapply(1:nrow(use_df), function(r) {
>     r_combo = paste(use_df[r,], collapse="_")
>     return(data.frame("Combo" = r_combo))
>     })
>     combo = do.call(rbind, combo_list)
>
>     names(combo) = "Combo"
>
>     return(combo)
>
>}
>
>
>combo_col = UpdateCombo(df, use_columns)
>
>df_combo = do.call(cbind, list(df, combo_col))
>
>
>Thanks
>
>
>--
>Micha Silver
>Ben Gurion Univ.
>Sde Boker, Remote Sensing Lab
>cell: +972-523-665918
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list