[R] writing a function to work with dplyr::mutate()

John Kane jrkr|de@u @end|ng |rom gm@||@com
Wed Jan 20 00:07:36 CET 2021


David
library(tidyverse)
char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)
recode(char_vec, a = "Apple")

works for me.

On Tue, 19 Jan 2021 at 15:13, David Winsemius <dwinsemius using comcast.net>
wrote:

>
> On 1/19/21 11:17 AM, Bill Dunlap wrote:
> > Your translate... function seems unnecessarily complicated and reusing
> the
> > name 'var' for both the input and the data.frame containing the input
> makes
> > it confusing to me.  The following replacement, f, uses your algorithm
> but
> > I think gets the answer you want.
>
>
> I was thinking that the tidyverse might already have a recode-like
> operation. But the dplyr::recode appears to be deprecated and you get
> referred to case_when. Perhaps following an example from the `case_when`
> help page:
>
>
> case_SEER_tsize <- function(tsize, upper, exceptions){
>
>      case_when(tsize <=upper ~tsize,
>
>                tsize %in% exceptions$bif ~ exceptions$new[match(tsize,
> exceptions$bif)])}
>
>
> I'm guessing that my lack of tidyversatility means there's probably a
> `match`-equivalent that I'm not familiar with.
>
>
>  > test1 <- data.frame(old = c(99,95,93, 8));lup <- data.frame(bif =
> c(93, 95, 99),
> +                                                            new = c(3,
> 5, NA))
>  >
>  > test1 %>%
> +     mutate(varb = case_SEER_tsize(.$old, 90, lup))
>    old varb
> 1  99   NA
> 2  95    5
> 3  93    3
> 4   8    8
>
> --
>
> David.
>
> >
> > f <-
> > function(var, upper, lookup) {
> >      names(lookup) <- c('old','new')
> >      var_df <- data.frame(old = var)
> >      lookup2 <- data.frame(old = c(1:upper),
> >                            new = c(1:upper))
> >      lookup3 <- rbind(lookup, lookup2)
> >      res <- left_join(var_df, lookup3, by = 'old')
> >      res$new # return a vector, not a data.frame or tibble.
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> > f(YYY, 90, lup))
> >    XXX YYY YYY_mm
> > 1  95  55     55
> > 2  93  66     66
> > 3  10  93      3
> > 4  20  98     NA
> >
> > You can modify this so that it names the output column based on the name
> of
> > the input column (by returning a data.frame/tibble instead of a numeric
> > vector):
> >
> > f1 <-
> > function(var, upper, lookup,  new_varname =
> > paste0(deparse1(substitute(var)), "_mm")) {
> >      names(lookup) <- c('old',new_varname)
> >      var_df <- data.frame(old = var)
> >      lookup2 <- data.frame(old = c(1:upper),
> >                            new = c(1:upper))
> >      names(lookup2)[2] <- new_varname
> >      lookup3 <- rbind(lookup, lookup2)
> >      res <- left_join(var_df, lookup3, by = 'old')[2]
> >      res
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> > 90, lup))
> >    XXX YYY YYY_mm
> > 1  95  55     55
> > 2  93  66     66
> > 3  10  93      3
> > 4  20  98     NA
> >
> > -Bill
> >
> > On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <sjrigatti using gmail.com>
> wrote:
> >
> >> I am having some problems with what seems like a pretty simple issue. I
> >> have some data where I want to convert numbers. Specifically, this is
> >> cancer data and the size of tumors is encoded using millimeter
> >> measurements. However, if the actual measurement is not available the
> >> coding may imply a less specific range of sizes. For instance numbers
> 0-89
> >> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> >> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
> >> etc.
> >>
> >> I have many such tables so I would like to be able to write a function
> >> which takes as input a threshold over which new values need to be looked
> >> up, and the new lookup table, returning the new values.
> >>
> >> I successfully wrote the function:
> >>
> >> translate_seer_numeric <- function(var, upper, lookup) {
> >>      names(lookup) <- c('old','new')
> >>      names(var) <- 'old'
> >>      var <- as.data.frame(var)
> >>      lookup2 <- data.frame(old = c(1:upper),
> >>                            new = c(1:upper))
> >>      lookup3 <- rbind(lookup, lookup2)
> >>   print(var)
> >>      res <- left_join(var, lookup3, by = 'old') %>%
> >>           select(new)
> >>
> >>      res
> >>
> >> }
> >>
> >> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93,
> 95,
> >> 99),
> >>                    new = c(3, 5, NA))
> >> translate_seer_numeric(test1, 90, lup)
> >>
> >> The above test generates the desired output:
> >>
> >>    old1  992  953  934   8
> >>    new1  NA2   53   34   8
> >>
> >> My problem comes when I try to put this in line with pipes and the
> mutate
> >> function:
> >>
> >> test1 %>%
> >>       mutate(varb = translate_seer_numeric(var = old, 90, lup))####
> >>   Error: Problem with `mutate()` input `varb`.
> >> x Join columns must be present in data.
> >> x Problem with `old`.
> >> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
> >>
> >> Thoughts??
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

	[[alternative HTML version deleted]]



More information about the R-help mailing list