[R] writing a function to work with dplyr::mutate()
John Kane
jrkr|de@u @end|ng |rom gm@||@com
Wed Jan 20 00:07:36 CET 2021
David
library(tidyverse)
char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)
recode(char_vec, a = "Apple")
works for me.
On Tue, 19 Jan 2021 at 15:13, David Winsemius <dwinsemius using comcast.net>
wrote:
>
> On 1/19/21 11:17 AM, Bill Dunlap wrote:
> > Your translate... function seems unnecessarily complicated and reusing
> the
> > name 'var' for both the input and the data.frame containing the input
> makes
> > it confusing to me. The following replacement, f, uses your algorithm
> but
> > I think gets the answer you want.
>
>
> I was thinking that the tidyverse might already have a recode-like
> operation. But the dplyr::recode appears to be deprecated and you get
> referred to case_when. Perhaps following an example from the `case_when`
> help page:
>
>
> case_SEER_tsize <- function(tsize, upper, exceptions){
>
> case_when(tsize <=upper ~tsize,
>
> tsize %in% exceptions$bif ~ exceptions$new[match(tsize,
> exceptions$bif)])}
>
>
> I'm guessing that my lack of tidyversatility means there's probably a
> `match`-equivalent that I'm not familiar with.
>
>
> > test1 <- data.frame(old = c(99,95,93, 8));lup <- data.frame(bif =
> c(93, 95, 99),
> + new = c(3,
> 5, NA))
> >
> > test1 %>%
> + mutate(varb = case_SEER_tsize(.$old, 90, lup))
> old varb
> 1 99 NA
> 2 95 5
> 3 93 3
> 4 8 8
>
> --
>
> David.
>
> >
> > f <-
> > function(var, upper, lookup) {
> > names(lookup) <- c('old','new')
> > var_df <- data.frame(old = var)
> > lookup2 <- data.frame(old = c(1:upper),
> > new = c(1:upper))
> > lookup3 <- rbind(lookup, lookup2)
> > res <- left_join(var_df, lookup3, by = 'old')
> > res$new # return a vector, not a data.frame or tibble.
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> > f(YYY, 90, lup))
> > XXX YYY YYY_mm
> > 1 95 55 55
> > 2 93 66 66
> > 3 10 93 3
> > 4 20 98 NA
> >
> > You can modify this so that it names the output column based on the name
> of
> > the input column (by returning a data.frame/tibble instead of a numeric
> > vector):
> >
> > f1 <-
> > function(var, upper, lookup, new_varname =
> > paste0(deparse1(substitute(var)), "_mm")) {
> > names(lookup) <- c('old',new_varname)
> > var_df <- data.frame(old = var)
> > lookup2 <- data.frame(old = c(1:upper),
> > new = c(1:upper))
> > names(lookup2)[2] <- new_varname
> > lookup3 <- rbind(lookup, lookup2)
> > res <- left_join(var_df, lookup3, by = 'old')[2]
> > res
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> > 90, lup))
> > XXX YYY YYY_mm
> > 1 95 55 55
> > 2 93 66 66
> > 3 10 93 3
> > 4 20 98 NA
> >
> > -Bill
> >
> > On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <sjrigatti using gmail.com>
> wrote:
> >
> >> I am having some problems with what seems like a pretty simple issue. I
> >> have some data where I want to convert numbers. Specifically, this is
> >> cancer data and the size of tumors is encoded using millimeter
> >> measurements. However, if the actual measurement is not available the
> >> coding may imply a less specific range of sizes. For instance numbers
> 0-89
> >> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> >> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
> >> etc.
> >>
> >> I have many such tables so I would like to be able to write a function
> >> which takes as input a threshold over which new values need to be looked
> >> up, and the new lookup table, returning the new values.
> >>
> >> I successfully wrote the function:
> >>
> >> translate_seer_numeric <- function(var, upper, lookup) {
> >> names(lookup) <- c('old','new')
> >> names(var) <- 'old'
> >> var <- as.data.frame(var)
> >> lookup2 <- data.frame(old = c(1:upper),
> >> new = c(1:upper))
> >> lookup3 <- rbind(lookup, lookup2)
> >> print(var)
> >> res <- left_join(var, lookup3, by = 'old') %>%
> >> select(new)
> >>
> >> res
> >>
> >> }
> >>
> >> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93,
> 95,
> >> 99),
> >> new = c(3, 5, NA))
> >> translate_seer_numeric(test1, 90, lup)
> >>
> >> The above test generates the desired output:
> >>
> >> old1 992 953 934 8
> >> new1 NA2 53 34 8
> >>
> >> My problem comes when I try to put this in line with pipes and the
> mutate
> >> function:
> >>
> >> test1 %>%
> >> mutate(varb = translate_seer_numeric(var = old, 90, lup))####
> >> Error: Problem with `mutate()` input `varb`.
> >> x Join columns must be present in data.
> >> x Problem with `old`.
> >> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
> >>
> >> Thoughts??
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
John Kane
Kingston ON Canada
[[alternative HTML version deleted]]
More information about the R-help
mailing list