[R] CSV file and date. Dates are read as factors!
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Jul 28 17:15:36 CEST 2005
Don MacQueen <macq at llnl.gov> writes:
> It's really pretty simple.
>
> First, if you supply as.is=TRUE to read.csv() [or read.table()] then
> your dates will be read as character strings, not factors. That saves
> the step of converting them from factor to character.
>
> Then, use as.Date() to convert the date columns to objects of class
> "Date". You will have to specify the format, if your dates are not in
> the default format.
>
> > tmp <- as.Date('2002-5-1')
> > as.Date(Sys.time())-tmp
> Time difference of 1184 days
>
> If your dates include times, then use as.POSIXct() instead of as.Date().
>
> > tmp <- as.POSIXct('2002-5-1 13:21')
> > Sys.time()-tmp
> Time difference of 1183.746 days
>
> If you don't want to use as.is, perhaps because you have other
> columns that you *want* to have as factors, then either supply
> colClasses to read.csv, or else just use format() to convert the
> factors to character.
>
> as.Date(format(your_date_column))
Actually, you can forget about the as.is stuff from 2.1.1 onwards
since as.Date works happily with factors:
> as.Date.factor
function (x, ...)
as.Date(as.character(x), ...)
(previous versions forgot to pass the ... arguments so it only worked
there if the standard format was used.) I suspect that as.character()
is preferable to format() - there could be issues with padding.
However, you can apply as.is selectively on columns: It can be a
logical vector or a vector of indices (numeric or character).
> As an aside, you might save yourself some time by using read.xls()
> from the gdata package.
>
> And of course, there's always the ugly work-around. In your Excel,
> create new columns in which the dates are formatted as numbers,
> presumably as the number of days since whatever Excel uses for its
> origin. Then, in R, you can simply subtract the numbers. If you have
> date-time values in Excel, this might be a little trickier.
>
> -Don
>
> At 9:28 PM -0400 7/27/05, John Sorkin wrote:
> >I am using read.csv to read a CSV file (produced by saving an Excel file
> >as a CSV file). The columns containing dates are being read as factors.
> >Because of this, I can not compute follow-up time, i.e.
> >Followup<-postDate-preDate. I would appreciate any suggestion that would
> >help me read the dates as dates and thus allow me to calculate follow-up
> >time.
> >Thanks
> >John
> >
> >John Sorkin M.D., Ph.D.
> >Chief, Biostatistics and Informatics
> >Baltimore VA Medical Center GRECC and
> >University of Maryland School of Medicine Claude Pepper OAIC
> >
> >University of Maryland School of Medicine
> >Division of Gerontology
> >Baltimore VA Medical Center
> >10 North Greene Street
> >GRECC (BT/18/GR)
> >Baltimore, MD 21201-1524
> >
> >410-605-7119
> >--- NOTE NEW EMAIL ADDRESS:
> >jsorkin at grecc.umaryland.edu
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
> --
> --------------------------------------
> Don MacQueen
> Environmental Protection Department
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list