[R] writing a function to work with dplyr::mutate()

Tue Jan 19 21:13:23 CET 2021

On 1/19/21 11:17 AM, Bill Dunlap wrote:
> Your translate... function seems unnecessarily complicated and reusing the
> name 'var' for both the input and the data.frame containing the input makes
> it confusing to me.  The following replacement, f, uses your algorithm but
> I think gets the answer you want.

I was thinking that the tidyverse might already have a recode-like 
operation. But the dplyr::recode appears to be deprecated and you get 
referred to case_when. Perhaps following an example from the `case_when` 
help page:

case_SEER_tsize <- function(tsize, upper, exceptions){

     case_when(tsize <=upper ~tsize,

               tsize %in% exceptions$bif ~ exceptions$new[match(tsize, 
exceptions$bif)])}

I'm guessing that my lack of tidyversatility means there's probably a 
`match`-equivalent that I'm not familiar with.

 > test1 <- data.frame(old = c(99,95,93, 8));lup <- data.frame(bif = 
c(93, 95, 99),
+                                                            new = c(3, 
5, NA))
 >
 > test1 %>%
+     mutate(varb = case_SEER_tsize(.$old, 90, lup))
   old varb
1  99   NA
2  95    5
3  93    3
4   8    8

-- 

David.

>
> f <-
> function(var, upper, lookup) {
>      names(lookup) <- c('old','new')
>      var_df <- data.frame(old = var)
>      lookup2 <- data.frame(old = c(1:upper),
>                            new = c(1:upper))
>      lookup3 <- rbind(lookup, lookup2)
>      res <- left_join(var_df, lookup3, by = 'old')
>      res$new # return a vector, not a data.frame or tibble.
> }
> E.g.,
>> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> f(YYY, 90, lup))
>    XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> You can modify this so that it names the output column based on the name of
> the input column (by returning a data.frame/tibble instead of a numeric
> vector):
>
> f1 <-
> function(var, upper, lookup,  new_varname =
> paste0(deparse1(substitute(var)), "_mm")) {
>      names(lookup) <- c('old',new_varname)
>      var_df <- data.frame(old = var)
>      lookup2 <- data.frame(old = c(1:upper),
>                            new = c(1:upper))
>      names(lookup2)[2] <- new_varname
>      lookup3 <- rbind(lookup, lookup2)
>      res <- left_join(var_df, lookup3, by = 'old')[2]
>      res
> }
> E.g.,
>> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> 90, lup))
>    XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> -Bill
>
> On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <sjrigatti using gmail.com> wrote:
>
>> I am having some problems with what seems like a pretty simple issue. I
>> have some data where I want to convert numbers. Specifically, this is
>> cancer data and the size of tumors is encoded using millimeter
>> measurements. However, if the actual measurement is not available the
>> coding may imply a less specific range of sizes. For instance numbers 0-89
>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
>> etc.
>>
>> I have many such tables so I would like to be able to write a function
>> which takes as input a threshold over which new values need to be looked
>> up, and the new lookup table, returning the new values.
>>
>> I successfully wrote the function:
>>
>> translate_seer_numeric <- function(var, upper, lookup) {
>>      names(lookup) <- c('old','new')
>>      names(var) <- 'old'
>>      var <- as.data.frame(var)
>>      lookup2 <- data.frame(old = c(1:upper),
>>                            new = c(1:upper))
>>      lookup3 <- rbind(lookup, lookup2)
>>   print(var)
>>      res <- left_join(var, lookup3, by = 'old') %>%
>>           select(new)
>>
>>      res
>>
>> }
>>
>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95,
>> 99),
>>                    new = c(3, 5, NA))
>> translate_seer_numeric(test1, 90, lup)
>>
>> The above test generates the desired output:
>>
>>    old1  992  953  934   8
>>    new1  NA2   53   34   8
>>
>> My problem comes when I try to put this in line with pipes and the mutate
>> function:
>>
>> test1 %>%
>>       mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>   Error: Problem with `mutate()` input `varb`.
>> x Join columns must be present in data.
>> x Problem with `old`.
>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>
>> Thoughts??
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.