[R] Help with replace()
Uwe Ligges
||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de
Sat Jul 14 08:55:32 CEST 2018
On 12.07.2018 18:09, Bill Poling wrote:
> Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol!
Using R for 20 years myself now I can only tell that it takes much longer.
Best,
Uwe Ligges
> Thank you David
>
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> "1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31"
>
> WHP
>
>
>
>
> From: David Winsemius [mailto:dwinsemius using comcast.net]
> Sent: Thursday, July 12, 2018 11:29 AM
> To: Bill Poling <Bill.Poling using zelis.com>
> Cc: r-help (r-help using r-project.org) <r-help using r-project.org>
> Subject: Re: [R] Help with replace()
>
>
>> On Jul 12, 2018, at 8:17 AM, Bill Poling <Bill.Poling using zelis.com<mailto:Bill.Poling using zelis.com>> wrote:
>>
>>
>> R version 3.5.1 (2018-07-02) -- "Feather Spray"
>> Copyright (C) 2018 The R Foundation for Statistical Computing
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> Hi.
>>
>> I have data set with day month year integers. I am creating a date column from those using lubridate.
>>
>> a hundred or so rows failed to parse.
>>
>> The problem is April and September have day = 31.
>>
>> paste(df1$year, df1$month, df1$day, sep = "-")
>>
>> ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial
>>
>> #The resulting Date vector can be added to df1 as a new column called date:
>> df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
>>
>>
>> head(df1)
>> sapply(df1$date,class) #"date"
>> summary(df1$date)
>> # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
>> #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129"
>>
>> is_missing_date <- is.na(df1$date)
>> View(is_missing_date)
>>
>> date_columns <- c("year", "month", "day")
>> missing_dates <- df1[is_missing_date, date_columns]
>>
>> head(missing_dates)
>> # year month day
>> # 3144 2000 9 31
>> # 3817 2000 4 31
>> # 3818 2000 4 31
>> # 3819 2000 4 31
>> # 3820 2000 4 31
>> # 3856 2000 9 31
>>
>> I am trying to replace those with 30.
>
> Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.)
>
> missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30
>
>
>> missing_dates
> year month day
> 3144 2000 9 30
> 3817 2000 4 30
> 3818 2000 4 30
> 3819 2000 4 30
> 3820 2000 4 30
> 3856 2000 9 30
>
> Best;
> David.
>
>>
>> I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites.
>>
>> https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1>
>> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace>
>> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument>
>> The following are screwy attempts at this simple repair,
>>
>> ??mutate_if
>>
>> ??replace
>>
>> is_missing_date <- is.na(df1$date)
>> View(is_missing_date)
>>
>> date_columns <- c("year", "month", "day")
>> missing_dates <- df1[is_missing_date, date_columns]
>>
>> head(missing_dates)
>> #year month day
>> # 3144 2000 9 31
>> # 3817 2000 4 31
>> # 3818 2000 4 31
>> # 3819 2000 4 31
>> # 3820 2000 4 31
>> # 3856 2000 9 31
>>
>> #So need those months with 30 days that are 31 to be 30
>> View(missing_dates)
>>
>> install.packages("dplyr")
>> library(dplyr)
>>
>>
>> View(missing_dates)
>> # ..those were the values you're going to replace
>>
>> I thought this function from stackover would work, but get error when I try to add filter
>>
>> #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1>
>> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
>> .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
>> return(.data_Frame)
>> }
>>
>> df.Rep(missing_dates, 3, 31, 30)
>>
>> #--So I should be able to apply this to the complete df1 data somehow?
>> head(df1)
>> df.Rep(df1, filter(month == c(4,9)), 31, 30)
>> #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types
>>
>>
>> Other screwy attempts:
>>
>>
>> select(df1, month, day, year)
>> str(df1)
>> #'data.frame': 34786 obs. of 14 variables:
>> #To choose rows, use filter():
>>
>> #mutate_if(df1, month =4,9), day = 30)
>>
>>
>> filter(df1, month == c(4,9), day == 31)
>>
>> df1 %>%
>> group_by(month == c(4,9), day == 31) %>%
>> tally()
>> # 1 FALSE FALSE 31161
>> # 2 FALSE TRUE 576
>> # 3 TRUE FALSE 2981
>> # 4 TRUE TRUE 68
>>
>> df1 %>%
>> mutate(day=replace(day, month == c(4,9), 30)) %>%
>> as.data.frame()
>> View(as.list(df1, month == 4))
>> View(df1, month == c(4,9), day == 31)
>>
>>
>> df1 %>%
>> group_by(month == c(4,9), day == 31) %>%
>> tally()
>> View(df1, month == c(4,9))
>>
>> # df1 %>%
>> # group_by(month == c(4,9), day == 30) %>%
>>
>>
>> I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R.
>>
>> Thank you for any advice.
>>
>> WHP
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
>>
>> ______________________________________________
>> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list