[R] Help with replace()
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Jul 12 17:28:48 CEST 2018
> On Jul 12, 2018, at 8:17 AM, Bill Poling <Bill.Poling using zelis.com> wrote:
>
>
> R version 3.5.1 (2018-07-02) -- "Feather Spray"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> Hi.
>
> I have data set with day month year integers. I am creating a date column from those using lubridate.
>
> a hundred or so rows failed to parse.
>
> The problem is April and September have day = 31.
>
> paste(df1$year, df1$month, df1$day, sep = "-")
>
> ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial
>
> #The resulting Date vector can be added to df1 as a new column called date:
> df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
>
>
> head(df1)
> sapply(df1$date,class) #"date"
> summary(df1$date)
> # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129"
>
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
>
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date, date_columns]
>
> head(missing_dates)
> # year month day
> # 3144 2000 9 31
> # 3817 2000 4 31
> # 3818 2000 4 31
> # 3819 2000 4 31
> # 3820 2000 4 31
> # 3856 2000 9 31
>
> I am trying to replace those with 30.
Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.)
missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30
> missing_dates
year month day
3144 2000 9 30
3817 2000 4 30
3818 2000 4 30
3819 2000 4 30
3820 2000 4 30
3856 2000 9 30
Best;
David.
>
> I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites.
>
> https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1
> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace
> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument
> The following are screwy attempts at this simple repair,
>
> ??mutate_if
>
> ??replace
>
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
>
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date, date_columns]
>
> head(missing_dates)
> #year month day
> # 3144 2000 9 31
> # 3817 2000 4 31
> # 3818 2000 4 31
> # 3819 2000 4 31
> # 3820 2000 4 31
> # 3856 2000 9 31
>
> #So need those months with 30 days that are 31 to be 30
> View(missing_dates)
>
> install.packages("dplyr")
> library(dplyr)
>
>
> View(missing_dates)
> # ..those were the values you're going to replace
>
> I thought this function from stackover would work, but get error when I try to add filter
>
> #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1
> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
> .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
> return(.data_Frame)
> }
>
> df.Rep(missing_dates, 3, 31, 30)
>
> #--So I should be able to apply this to the complete df1 data somehow?
> head(df1)
> df.Rep(df1, filter(month == c(4,9)), 31, 30)
> #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types
>
>
> Other screwy attempts:
>
>
> select(df1, month, day, year)
> str(df1)
> #'data.frame': 34786 obs. of 14 variables:
> #To choose rows, use filter():
>
> #mutate_if(df1, month =4,9), day = 30)
>
>
> filter(df1, month == c(4,9), day == 31)
>
> df1 %>%
> group_by(month == c(4,9), day == 31) %>%
> tally()
> # 1 FALSE FALSE 31161
> # 2 FALSE TRUE 576
> # 3 TRUE FALSE 2981
> # 4 TRUE TRUE 68
>
> df1 %>%
> mutate(day=replace(day, month == c(4,9), 30)) %>%
> as.data.frame()
> View(as.list(df1, month == 4))
> View(df1, month == c(4,9), day == 31)
>
>
> df1 %>%
> group_by(month == c(4,9), day == 31) %>%
> tally()
> View(df1, month == c(4,9))
>
> # df1 %>%
> # group_by(month == c(4,9), day == 30) %>%
>
>
> I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R.
>
> Thank you for any advice.
>
> WHP
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-help
mailing list