[R] Help on using read.table with files containing dates
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jun 25 15:16:55 CEST 2003
On Wed, 25 Jun 2003, Chriss, Neil wrote:
> I am a relatively new user of R and have a question about using the
> read.table command for reading in .csv files that contain dates.
Unfortuantely them appear to contain perversions of dates, not the dates
recognised by the ISO standard. I read your dates as the first of
Jan, Apr and May 2003, but some illogical people use mm/dd/yy order.
> My
> ultimate goal is to be able to read several different files into data frames
> and then do a merge along the "Date" column of each data frame. It seems I
> should be able to specify the name of the column that contains dates and
> then automatically convert that to dates, all in a single read.table
> statement. It's such a natural thing to do (and the help for read.table
> gives me hope, but I have not been able to figure out some of the options,
> e.g., colNames).
>
> Here is a specific example of what I mean and what the problems are. An
>
> First I load a sample data file whose first column is a date. This gives
> the wrong answer as we shall see:
>
> > library(date)
> > sampData <- read.table("sampleData.csv",sep=",",header=TRUE)
> > sampData
> Date Col1 Col2 Col3
> 1 1/1/2003 1.2 1.4 0.160
> 2 1/4/2003 1.8 1.2 0.900
> 3 1/5/2003 0.9 1.1 -0.003
> > mode(sampData$Date)
> [1] "numeric"
> >
>
> Note that the Date column is coerced incorrectly into being numeric.
It is not: mode is inappropriate here: you actually have a factor
whose mode is numeric. Use class(), not mode().
> Right
> now what I do is force R to read this in as a character using "as.is" and
> then convert it to a date as follows:
>
> > sampData <- read.table("sampleData.csv",sep=",",header=TRUE,as.is=1)
> > sampData
> Date Col1 Col2 Col3
> 1 1/1/2003 1.2 1.4 0.160
> 2 1/4/2003 1.8 1.2 0.900
> 3 1/5/2003 0.9 1.1 -0.003
> > mode(sampData$Date)
> [1] "character"
> > sampData$Date <- as.date(sampData$Date)
> > sampData
> Date Col1 Col2 Col3
> 1 15706 1.2 1.4 0.160
> 2 15709 1.8 1.2 0.900
> 3 15710 0.9 1.1 -0.003
> >
>
> Now the Data column is converted to Julian, which is good b/c if I repeat
> this procedure with other dataframes I can do a merge on the date columns
> and line up the dates. But, there are two drawbacks with this approach
> which I do not know how to solve:
>
> 1. it's generally clumsy and takes too many lines.
>
> 2. (more importantly) it forces me to specify which column number contains
> the dates instead of simply stating which column header name contains date.
> I would rather simply the column header name of the column that contains the
> date (in this example: Date) so that whichever column contains the date will
> be automatically converted to dates.
>
> So, is there a way to specify that I want the column labeled "Date" to be
> read as a date class (as in as.date)?
Yes, that's what the argument colClasses is for. But since as.date is
not part of R (it is in contributed packages survival and date) you will
need to provide an as() method.
I suggest you use
sampData[["Date"]] <- strptime(as.character(sampData[["Date"]]), "%d/%m/%y")
since the conversion to factor can be undone very easily.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list