[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Sun Nov 9 20:44:55 CET 2014


Hi Arun and Dennis,

This is just an FYI.

You're right - In one row, there are all NA's in  the four  "date" columns.  I have tested below the "TRUEness" of the condition Arun has set.

is.logical(data1[rowSums(is.na(data1[,-1]))!=4,])
[1] FALSE

All these 3 approaches below provide the exact same results.

# Approach 1 (suggested by Arun): The code gives the expected results, but with a warning message.
data1 %>% 

   rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                             na.rm=TRUE), origin='1970-01-01'))

# Approach 2: This code (suggested by Dan) does not provide now a warning message although it provided such message earlier.
data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


# Approach 2: This code (suggested by Mark) does not provide a warning message
data2 <- data1
data2$oidflag <- as.Date(sapply(seq_along(data2$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")


##########################  ends here ################

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Sunday, November 09, 2014 10:18 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)



Dear Pradip,

From the documentation of ?max: 


   The minimum and maximum of a numeric empty set are ‘+Inf’ and
        ‘-Inf’ 

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep that row with all NAs.  You could remove it and run the code or keep it and run with that warning.

data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %>% 

      rowwise()%>%
      mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:



Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>% 
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>% 

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list