[R] CSV file and date. Dates are read as factors!

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Jul 28 17:15:36 CEST 2005


Don MacQueen <macq at llnl.gov> writes:

> It's really pretty simple.
> 
> First, if you supply as.is=TRUE to read.csv() [or read.table()] then 
> your dates will be read as character strings, not factors. That saves 
> the step of converting them from factor to character.
> 
> Then, use as.Date() to convert the date columns to objects of class 
> "Date". You will have to specify the format, if your dates are not in 
> the default format.
> 
> >  tmp <- as.Date('2002-5-1')
> >  as.Date(Sys.time())-tmp
> Time difference of 1184 days
> 
> If your dates include times, then use as.POSIXct() instead of as.Date().
> 
> >  tmp <- as.POSIXct('2002-5-1 13:21')
> >  Sys.time()-tmp
> Time difference of 1183.746 days
> 
> If you don't want to use as.is, perhaps because you have other 
> columns that you *want* to have as factors, then either supply 
> colClasses to read.csv, or else just use format() to convert the 
> factors to character.
> 
> as.Date(format(your_date_column))

 Actually, you can forget about the as.is stuff from 2.1.1 onwards
since as.Date works happily with factors:

> as.Date.factor
function (x, ...)
as.Date(as.character(x), ...)

(previous versions forgot to pass the ... arguments so it only worked
there if the standard format was used.) I suspect that as.character()
is preferable to format() - there could be issues with padding.

However, you can apply as.is selectively on columns: It can be a
logical vector or a vector of indices (numeric or character).  
 
> As an aside, you might save yourself some time by using read.xls() 
> from the gdata package.
> 
> And of course, there's always the ugly work-around. In your Excel, 
> create new columns in which the dates are formatted as numbers, 
> presumably as the number of days since whatever Excel uses for its 
> origin. Then, in R, you can simply subtract the numbers. If you have 
> date-time values in Excel, this might be a little trickier.
> 
> -Don
> 
> At 9:28 PM -0400 7/27/05, John Sorkin wrote:
> >I am using read.csv to read a CSV file (produced by saving an Excel file
> >as a CSV file). The columns containing dates are being read as factors.
> >Because of this, I can not compute follow-up time, i.e.
> >Followup<-postDate-preDate. I would appreciate any suggestion that would
> >help me read the dates as dates and thus allow me to calculate follow-up
> >time.
> >Thanks
> >John
> >
> >John Sorkin M.D., Ph.D.
> >Chief, Biostatistics and Informatics
> >Baltimore VA Medical Center GRECC and
> >University of Maryland School of Medicine Claude Pepper OAIC
> >
> >University of Maryland School of Medicine
> >Division of Gerontology
> >Baltimore VA Medical Center
> >10 North Greene Street
> >GRECC (BT/18/GR)
> >Baltimore, MD 21201-1524
> >
> >410-605-7119
> >--- NOTE NEW EMAIL ADDRESS:
> >jsorkin at grecc.umaryland.edu
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 
> -- 
> --------------------------------------
> Don MacQueen
> Environmental Protection Department
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list