[R] Confusion with Converting Factors to Dates using as.date
Marc Schwartz
marc_schwartz at comcast.net
Wed Dec 10 22:25:46 CET 2008
on 12/10/2008 02:41 PM Josip Dasovic wrote:
> Dear R-Helpers:
>
> I'm having a problem getting dates into the correct format. I have a
> data frame, which is based on a .csv file that I imported into R via
> read.table.
>
> R has converted my date variables to factors; when I use the as.Date
> command, most of the values are converted "correctly" (and by this I
> guess I mean converted "as I wish them to be") but some have not
> been.
>
> Here's what I have: str(pk.df)
>
> 'data.frame': 206 obs. of 134 variables: $ uniqid : int 010
> 015 120 130 210 245 320 330 415 ... $ st_date : Factor w/ 154
> levels "01/01/48","01/01/51",..: 46 27 NA 12 118 NA 63 127 NA NA ...
> ...
>
> I then convert them to a date class using
>
> st_date.new<-as.Date(st_date, "%m/%d/%y")
>
> This _seems_ to work...
>
> str(st_date.new) Class 'Date' num [1:206] 8150 8466 NA 33982
> 10149 ...
>
> But notice the 4th observation; I would like it to be 1963, not 2063.
>
>
> st_date.new[1:10] [1] "1992-04-25" "1993-03-07" NA
> "2063-01-15" "1997-10-15" [6] NA "1991-05-31" "1994-11-20"
> NA NA
>
> st_date[1:10] [1] 04/25/92 03/07/93 <NA> 01/15/63 10/15/97 <NA>
> 05/31/91 [8] 11/20/94 <NA> <NA> 154 Levels: 01/01/48 01/01/51
> 01/01/52 01/01/59 01/01/63 ... 12/31/96
>
>
> I thought that the problem might be that I was converting a factor,
> so I first converted the variable to a character type (although I
> understand that this is done automatically) and then to date class,
> but I still had the same problem. Does anybody know how I can solve
> this and why I am getting this behavior? One more tidbit: the
> earliest date for which the date conversion is "correct" is
> 1969-04-15, while the most recent date for which the century is
> "incorrect" is 1967-11-05.
>
> Thanks, Josip
This is the consequence of using a two digit year rather than a four
digit year, which BTW, was one of the Y2K issues raised a decade ago...
As per ?strptime:
%y
Year without century (00–99). If you use this on input, which
century you get is system-specific. So don't! Often values up to 68 (or
69) are prefixed by 20 and 69 (or 70) to 99 by 19.
If you know that all of your dates are going to be before 2000, you can
do the following, by using a regex to convert the two digit year to a
four digit year and then use as.Date() with '%Y':
st_date <- "01/15/63"
> sub("([0-9]{2})$", "19\\1", st_date)
[1] "01/15/1963"
> as.Date(sub("([0-9]{2})$", "19\\1", st_date), format = "%m/%d/%Y")
[1] "1963-01-15"
The better option is to ensure that the source of your data outputs or
exports dates with a four digit year, before importing into R.
See ?sub and ?regex
HTH,
Marc Schwartz
More information about the R-help
mailing list