[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

David Winsemius dwinsemius at comcast.net
Thu Dec 4 02:14:21 CET 2014


On Dec 3, 2014, at 2:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
> 
> Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.  
> 
> identical(new1, new2) 
> [1] FALSE
> 

You should have examined the output from dput() on both objects. I think you will find that dplyr is adding new attributes.

Notice the the "mutate()-ed" object now has this class:

class = c("rowwise_df", "tbl_df", "tbl", "data.frame")

Moral: Never rely on the the print representation.

-- 
David.


> Please see the reproducible example below.
> 
> I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.
> 
> Thanks,
> 
> Pradip
> 
> #############################################  reproducible example ########################################
> library(dplyr)
> # data object - description 
> 
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA     
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
> 
> # read the data object
> 
> example.data <- read.table(textConnection(temp), 
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),  
>                    header=TRUE, as.is=TRUE
>                    )
> 
> 
> # create a new column -dplyr solution (Acknowledgement: Arun)
> 
> new1 <- example.data %>% 
>     rowwise() %>%
>      mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
> 
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
> 
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>    max_d <- NA
>  } else {
>    max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
> 
> identical(new1, new2) 
> 
> # print records
> 
> print (new1); print(new2)
> 
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
> Sent: Sunday, November 09, 2014 6:11 AM
> To: 'Mark Sharp'
> Cc: r-help at r-project.org
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
> 
> Hi Mark,
> 
> Your code has also given me the results I expected.  Thank you so much for your help.
> 
> Regards,
> 
> Pradip
> 
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
> 
> 
> -----Original Message-----
> From: Mark Sharp [mailto:msharp at TxBiomed.org] 
> Sent: Sunday, November 09, 2014 3:01 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: r-help at r-project.org
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
> 
> Pradip,
> 
> mutate() works on the entire column as a vector so that you find the maximum of the entire data set.
> 
> I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.
> 
> max() does not want a dataframe thus the use of unlist().
> 
> Using your definition of data1:
> 
> data3 <- data1
> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>  if (all(is.na(unlist(data1[row, -1])))) {
>    max_d <- NA
>  } else {
>    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
> 
> data3
>  id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
> 
> 
> 
> R. Mark Sharp, Ph.D.
> Director of Primate Records Database
> Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: msharp at TxBiomed.org
> 
> 
> 
> 
> 
> NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list