[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
arun
smartpink111 at yahoo.com
Sun Nov 9 16:17:35 CET 2014
Dear Pradip,
>From the documentation of ?max:
The minimum and maximum of a numeric empty set are ‘+Inf’ and
‘-Inf’
One of the rows in your dataset is all `NAs.` I am not sure you want to keep that row with all NAs. You could remove it and run the code or keep it and run with that warning.
data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]
data1 %>%
rowwise()%>%
mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')
A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Dear Arun,
Thank you so much for sending me the dplyr/mutate() solution to my code. But, I am getting the following warning message. Any suggestions on how to avoid this message?
Pradip
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
#################################################################
data1 %>%
+
+ rowwise() %>%
+ mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+ na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>
id mrjdate cocdate inhdate haldate oldflag
1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2 2 <NA> <NA> <NA> <NA> <NA>
3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13
4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10
5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01
6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05
7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
-----Original Message-----
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
You could try
library(dplyr)
data1 %>%
rowwise() %>%
mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>
id mrjdate cocdate inhdate haldate oldflag
1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2 2 <NA> <NA> <NA> <NA> <NA>
3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13
4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10
5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01
6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05
7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04
A.K.
On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hello,
The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).
I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below.
Thanks in advance.
Pradip
###### from the console ########
print (data2)
id mrjdate cocdate inhdate haldate oidflag
1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04
2 2 <NA> <NA> <NA> <NA> <NA>
3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-11-04
4 4 2007-10-10 <NA> <NA> <NA> 2011-11-04
5 5 2006-09-01 2005-08-10 <NA> <NA> 2011-11-04
6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-11-04
7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04
################## Reproducible code and data #####################################
library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the
temp <- "id mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"
# read the data object
data1 <- read.table(textConnection(temp),
colClasses=c("character", "Date", "Date", "Date", "Date"),
header=TRUE, as.is=TRUE
)
# create a new column
data2 <- mutate(data1,
oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate) & is.na(haldate), NA,
max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )
)
)
# convert to date
data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")
# print records
print (data2)
Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list