[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Muhuri, Pradip (SAMHSA/CBHSQ)
Pradip.Muhuri at samhsa.hhs.gov
Mon Nov 10 17:10:14 CET 2014
Hello,
The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.
Thanks,
Pradip Muhuri
################# cut and pasted from the R console ####################
id mrjdate cocdate inhdate haldate oiddate
1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2 2 <NA> <NA> <NA> <NA> <NA>
3 3 2009-10-24 <NA> 2011-10-13 <NA> 2011-10-13
4 4 2007-10-10 <NA> <NA> <NA> 2007-10-10
5 5 2006-09-01 2005-08-10 <NA> <NA> 2006-09-01
6 6 2007-09-04 2011-10-05 <NA> <NA> 2011-10-05
7 7 2005-10-25 <NA> <NA> 2011-11-04 2011-11-04
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data2$cocdate[complete.cases(data2$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data2$inhdate[complete.cases(data2$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data2$haldate[complete.cases(data2$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA "2011-11-04"
################ reproducible code #############################
library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the
temp <- "id mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"
# read the data object
data1 <- read.table(textConnection(temp),
colClasses=c("character", "Date", "Date", "Date", "Date"),
header=TRUE, as.is=TRUE
)
# create a new column
data2 <- data1 %>%
rowwise() %>%
mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
na.rm=TRUE), origin='1970-01-01'))
# print records
print (data2)
# range of dates
range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])
Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
[[alternative HTML version deleted]]
More information about the R-help
mailing list