[R] Help with replace()

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Jul 12 17:28:48 CEST 2018


> On Jul 12, 2018, at 8:17 AM, Bill Poling <Bill.Poling using zelis.com> wrote:
> 
> 
> R version 3.5.1 (2018-07-02) -- "Feather Spray"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> Hi.
> 
> I have data set with day month year integers. I am creating a date column from those using lubridate.
> 
> a hundred or so rows failed to parse.
> 
> The problem is April and September have day = 31.
> 
> paste(df1$year, df1$month, df1$day, sep = "-")
> 
> ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial
> 
> #The resulting Date vector can be added to df1 as a new column called date:
> df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
> 
> 
> head(df1)
> sapply(df1$date,class) #"date"
> summary(df1$date)
> # Min.      1st Qu.       Median         Mean      3rd Qu.         Max.         NA's
> #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31"        "129"
> 
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
> 
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date,  date_columns]
> 
> head(missing_dates)
> #      year month day
> # 3144 2000     9  31
> # 3817 2000     4  31
> # 3818 2000     4  31
> # 3819 2000     4  31
> # 3820 2000     4  31
> # 3856 2000     9  31
> 
> I am trying to replace those with 30.

Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.)

 missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30


> missing_dates
     year month day
3144 2000     9  30
3817 2000     4  30
3818 2000     4  30
3819 2000     4  30
3820 2000     4  30
3856 2000     9  30

Best;
David.

> 
> I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites.
> 
> https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1
> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace
> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument
> The following are screwy attempts at this simple repair,
> 
> ??mutate_if
> 
> ??replace
> 
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
> 
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date,  date_columns]
> 
> head(missing_dates)
> #year month day
> # 3144 2000     9  31
> # 3817 2000     4  31
> # 3818 2000     4  31
> # 3819 2000     4  31
> # 3820 2000     4  31
> # 3856 2000     9  31
> 
> #So need those months with 30 days that are 31 to be 30
> View(missing_dates)
> 
> install.packages("dplyr")
> library(dplyr)
> 
> 
> View(missing_dates)
> # ..those were the values you're going to replace
> 
> I thought this function from stackover would work, but get error when I try to add filter
> 
> #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1
> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
>  .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
>  return(.data_Frame)
> }
> 
> df.Rep(missing_dates, 3, 31, 30)
> 
> #--So I should be able to apply this to the complete df1 data somehow?
> head(df1)
> df.Rep(df1, filter(month == c(4,9)), 31, 30)
> #Error in month == c(4, 9)  :   comparison (1) is possible only for atomic and list types
> 
> 
> Other screwy attempts:
> 
> 
> select(df1, month, day, year)
> str(df1)
> #'data.frame':   34786 obs. of  14 variables:
> #To choose rows, use filter():
> 
> #mutate_if(df1, month =4,9), day = 30)
> 
> 
> filter(df1, month == c(4,9), day == 31)
> 
> df1 %>%
>  group_by(month == c(4,9), day == 31) %>%
>  tally()
> # 1 FALSE              FALSE       31161
> # 2 FALSE              TRUE          576
> # 3 TRUE               FALSE        2981
> # 4 TRUE               TRUE           68
> 
>  df1 %>%
>  mutate(day=replace(day, month == c(4,9), 30)) %>%
>  as.data.frame()
>  View(as.list(df1, month == 4))
>  View(df1, month == c(4,9), day == 31)
> 
> 
> df1 %>%
>  group_by(month == c(4,9), day == 31) %>%
>  tally()
> View(df1, month == c(4,9))
> 
> # df1 %>%
> #   group_by(month == c(4,9), day == 30) %>%
> 
> 
> I know there is a simple solution  and it is driving me mad that it eludes me, despite being new to R.
> 
> Thank you for any advice.
> 
> WHP
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law




More information about the R-help mailing list