[R] Date in dataframe manipulation

Dan Chan dchan at GFC.STATE.GA.US
Mon Mar 27 16:20:05 CEST 2006


Thank you Marc and Don's help, especially Marc's.  

Output<- subset(FireDataAppling, select = c(STARTDATE, County, TOTAL,
CAUSE))
Worked! 
STARTDATE IS a factor and I used the following command to get the
yyyy-mm-dd format of the date
Output$Date<- as.POSIXct(Output$STARTDATE)

Thank you! 

Daniel Chan

-----Original Message-----
From: Marc Schwartz (via MN) [mailto:mschwartz at mn.rr.com] 
Sent: Friday, March 24, 2006 9:22 PM
To: Dan Chan
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Date in dataframe manipulation

On Fri, 2006-03-24 at 15:29 -0500, Dan Chan wrote:
> Hi,
> 
> I have a dataframe with many columns, including date and I want to
keep
> only a few of the columns including date column.
> 
> I used the following command: 
> with(FireDataAppling, cbind(STARTDATE, County, TOTAL, CAUSE)
> 
> It works, but the date becomes days from Jan 1, 2001.  
> 
> FireDataAppling$STARTDATE[1] gives
> [1] 2001-01-04 00:00:00  
> 1703 Levels: .........

This output suggests that STARTDATE is a factor, rather than a Date
related data type. Did you read this data in via one of the read.table()
family of functions? If these values are quoted character fields in the
imported text file, they will be converted to factors by default.

> After the cbind command, the entry becomes a 4.  
> 
> I want to get 2001-01-04.  What command should I use?  
> 
> Thank you. 

You might want to review the "Note" section in ?cbind, relative to the
result of cbind()ing vectors of differing data types. By using with(),
you are effectively taking the data frame columns as individual vectors
and the resultant _matrix_ will be coerced to a single data type, in
this case, presumably numeric. I am guessing that 'County' and 'CAUSE'
are also factors, whereas 'TOTAL' is numeric.

Using str(FireDataAppling) will give you some insight into the structure
of your data frame.

The '4' that you are getting is the factor level numeric code for the
entry above, not the number of days since Jan 1, 2001, which is not a
default 'origin' date in R. Jan 1, 1970 is.

You might want to look at ?factor for more insight here.

If you want to retain only a _subset_ of the columns in a data frame,
use the subset() function:

  subset(FireDataAppling, select = c(STARTDATE, County, TOTAL, CAUSE))

This will return a data frame and retain the original data types. If you
want to then perform actual Date based operations on those values, take
a look at ?DateTimeClasses, paying attention to the "See Also" section
relative to associated functions.

HTH,

Marc Schwartz




More information about the R-help mailing list