[R] RE: Reading Dates in a csv File

Tue Feb 8 06:46:59 CET 2005

My first thought was that all it looked a bit complicated for something that should be straightforward.

I created a file called t.txt. I worked out the way I would have done it and then I tested to see which was fastest. One little hiccup is that the two objects are not identical and I though they would be. Of course I could have made a typo somewhere. But then there may be something I have not come across. Guess it's time to see what identical really means.

> system.time({
+ file <- read.csv("t.txt",header=F,
+                     col.names =c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses = c("character",
+                                   "numeric",
+                                   "character",
+                                   "character",
+                                   "numeric")
+ )
+ file$d_field_3 <- as.POSIXct(strptime(file$d_field_3,format="%m/%d/%Y" ))
+ file$d_field_4 <- as.POSIXct(strptime(file$d_field_4,format="%m/%d/%Y %I:%M:%S %p" ))
+  })
[1] 0.00 0.00 0.02   NA   NA
>  
> 
> 
> read_file <- function(file,nrows=-1) {
+ 
+    # create temp classes
+    setClass("t_class_",representation("character"))
+    setAs("character", "t_class_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y")))
+ 
+    setClass("t_class2_", representation("character"))
+    setAs("character", "t_class2_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
+ 
+    # read the file
+    file <- read.csv(file,
+                     header=FALSE,
+                     comment.char = "",
+                     nrows=nrows,
+                     as.is=FALSE,
+                     col.names=c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses=c("character",
+                                   "numeric",
+                                   "t_class_",
+                                   "t_class2_",
+                                   "numeric")
+                      )
+ 
+    # remove them now that we are done with them
+    removeClass("t_class_")
+    removeClass("t_class2_")
+ 
+    return(file)
+ 
+ }
> system.time(file2 <- read_file("t.txt"))
[1] 0.14 0.00 0.16   NA   NA
> 
> identical(file, file2)
[1] FALSE
> 
> file
  c_field_1 n_field_2  d_field_3           d_field_4 n_field_5
1       MHK     76.53 2004-05-21 2004-05-04 16:00:00        60
2       MHK     76.53 2004-06-21 2004-05-05 16:00:00        60
3       MHK     76.53 2004-07-21 2004-05-06 16:00:00        65
4       MHK     76.53 2004-08-21 2004-05-07 16:00:00        65
5       MHK     76.53 2004-09-21 2004-05-08 16:00:00        70
> file2
  c_field_1 n_field_2  d_field_3           d_field_4 n_field_5
1       MHK     76.53 2004-05-21 2004-05-04 16:00:00        60
2       MHK     76.53 2004-06-21 2004-05-05 16:00:00        60
3       MHK     76.53 2004-07-21 2004-05-06 16:00:00        65
4       MHK     76.53 2004-08-21 2004-05-07 16:00:00        65
5       MHK     76.53 2004-09-21 2004-05-08 16:00:00        70
> str(file)
`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK" "MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21" "2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00" "2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07 16:00:00" ...
 $ n_field_5: num  60 60 65 65 70
> str(file2)
`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK" "MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21" "2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00" "2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07 16:00:00" ...
 $ n_field_5: num  60 60 65 65 70
> 

> -----Original Message-----
> From: Charles and Kimberly Maner [mailto:ckjmaner at carolina.rr.com]
> Sent: Tuesday, 8 February 2005 12:08 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] RE: Reading Dates in a csv File
> 
> 
> 
> Hi all.  Thanks for all of your help/suggestions.  I found an 
> old email in
> the R-help archives, pieced together a couple things and 
> arrived at the
> solution below.  As an additional followup, I thought I would 
> go ahead and
> post it should other readers come across this same situation. 
>  Here goes..
> 
> Raw data:
> MHK,76.53,05/21/2004,5/4/2004 4:00:00 PM,60
> MHK,76.53,06/21/2004,5/5/2004 4:00:00 PM,60
> MHK,76.53,07/21/2004,5/6/2004 4:00:00 PM,65
> MHK,76.53,08/21/2004,5/7/2004 4:00:00 PM,65
> MHK,76.53,09/21/2004,5/8/2004 4:00:00 PM,70
> 
> Code:
> read_file <- function(file,nrows=-1) {
> 
>    # create temp classes
>    setClass("t_class_",representation("character"))
>    setAs("character", "t_class_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y")))
>   
>    setClass("t_class2_", representation("character"))
>    setAs("character", "t_class2_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
> 
>    # read the file
>    file <- read.csv(file,
>                     header=FALSE,
>                     comment.char = "",
>                     nrows=nrows,
>                     as.is=FALSE,
>                     col.names=c("c_field_1",
>                                 "n_field_2",
>                                 "d_field_3",
>                                 "d_field_4",
>                                 "n_field_5),
>                      colClasses=c("character",
>                                   "numeric",
>                                   "t_class_",
>                                   "t_class2_",
>                                   "numeric")
>                      )
> 
>    # remove them now that we are done with them
>    removeClass("t_class_")
>    removeClass("t_class2_")
> 
>    return(file)
> 
> }
> 
> If any of you folks know a better way and/or have 
> comments/enhancements to
> this code, feel free to post/email your critique.
> 
> 
> Thanks,
> Charles
> 
> 
> 
> 
> > _____________________________________________ 
> > From: 	Charles and Kimberly Maner 
> [mailto:ckjmaner at carolina.rr.com]
> > 
> > Sent:	Thursday, February 03, 2005 8:35 AM
> > To:	'r-help at stat.math.ethz.ch'
> > Subject:	Reading Dates in a csv File
> > 
> > 
> > Hi all.  I'm reading in a flat, comma-delimited flat file 
> using read.csv.
> > It works marvelously for the most part.  I am using the colClasses
> > argument to, basically, create numeric, factor and 
> character classes for
> > the columns I'm reading in.  However, a couple of the 
> fields in the file
> > are date fields.  I'm fully aware that POSIXct can be used 
> as a class,
> > however the field must obey, (I think), the standard/default POSIXct
> > format.  Hence the following question:  Does anyone have a 
> method they can
> > share to read in a non-standard formatted date to convert 
> to POSIXct?  I
> > can read it in then convert it, but that's a two pass 
> approach and not as
> > elegant as a single pass through read.csv.  I've read, from the
> > documentation, that "[o]therwise there needs to be an as 
> method (from
> > package methods) for conversion from "character" to the 
> specified formal
> > class" but I do not know and have not figured out how to do that.
> > 
> > Any suggestion(s) would be greatly appreciated.
> > 
> > 
> > Thanks,
> > Charles
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>