[R] Date-Time-Stamp input method for user-specific formats

David Winsemius dwinsemius at comcast.net
Tue Oct 6 00:24:40 CEST 2009


On Oct 5, 2009, at 5:14 PM, esp wrote:

>
> Date-Time-Stamp input method to correctly interpret user-specific
> formats:coding is  90% there - based on exmple at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
> ...anyone got the last 10% please?
>
> CONTEXT:
>
> Data is received where one of the columns is a datetimestamp.  At  
> midnight,
> the value represented as text in this column consists of just the  
> date part,
> e.g. "01/09/2009".  At other times, the value in the column contains  
> both
> date and time e.g. "01/09/2009 00:00:01".  The goal is to read it  
> into R as
> an appropriate data type, where for example date arithmetic can be
> performed.  As far as I can tell, the most appropriate such data  
> type is
> POSIXct.  The trick then is to read in the datetimestamps in the  
> data as
> this type.
>
> PROBLEM:
>
> POSIXct defaults to a text representation almost but not quite like my
> received data.  The main difference is that the POSIXct date part is  
> in
> reverse order, e.g. "2009-09-01".  It is possible to define a  
> different
> format where date and time parts look like my data but when  
> encountering
> datetimestamps where only the the date part is present (as in the  
> case of my
> midnight data) then this is interpreted as NA i.e. undefined.
>
> SOLUTION (ALMOST):
>
> There is a workaround (based on example at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html).  It is  
> possible to
> define a class then read the data in as this class.  For such a  
> class it is
> possible to define a class method, in terms of a function, for  
> translating a
> text (character string) representation into a value. In that  
> function, one
> can use a conditional expression to treat midnight datetimestamps
> differently from those at other times of day.  The example below  
> does that.
> In order to apply this function over all of the datetimestamp values  
> in the
> column, it is necessary to use something like R's 'sapply' function.
>
> SNAG:
>
> The function below implements this approach.  A datetimestamp with  
> only the
> date part, including leading zeroes, is always length 10  
> (characters).   It
> correctly interprets the datetimestamp values, but unfortunately  
> translates
> them into what appear to be numeric type.  I am actually uncertain  
> precisely
> what is happening, as I am very new to R and have most certainly  
> stretched
> myself in writing this code.  I think perhaps it returns a list and
> something associated with this aspect makes it "forget" the data  
> type is
> POSIXct or at least how such a type should be displayed as text or  
> what to
> do about it.
>
> PLEA:
>
> Please, can anyone give any help whatsoever, however tenuous?
>
> CODE, DATA & RESULTS:
>
> Function to Read required data, intended to make the datetime column  
> of the
> data (example given further below) into POSIXct values:
> <<<
> spot_frequency_readin <- function(file,nrows=-1) {
>
> # create temp class
> setClass("t_class2_", representation("character"))
> setAs("character", "t_class2_", function(from) {sapply(from,  
> function(x) {
>  if (nchar(x)==10) {
> as.POSIXct(strptime(x,format="%d/%m/%Y"))
> }
> else {
> as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
> }
> }
> )
> }
> )
>
> #(for format symbols, see "R Reference Card")
>
> # read the file (TSV)
> file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
> as.is=FALSE, col.names=c("DATETIME", "FREQ"),  
> colClasses=c("t_class2_",
> "numeric") )
>
> # remove it now that we are done with it
> removeClass("t_class2_")
>
> return(file)
> }
>>>>
> This appears to work apart as regards processing each row of data  
> correctly,
> but the values returned look like numeric equivalents of POSIXct, as  
> opposed
> to the expected character-based (string) equivalents:
>
>
> Example Data:
> <<<
> DATETIME	FREQ
> 01/09/2009	59.036
> 01/09/2009 00:00:01	58.035
> 01/09/2009 00:00:02	53.035
> 01/09/2009 00:00:03	47.033
> 01/09/2009 00:00:04	52.03
> 01/09/2009 00:00:05	55.025
>>>>
>
>
> Example Function Call:
> <<<
>> spot = spot_frequency_readin("mydatafile.txt",4)
>>>>
>
>
> Result of Example Function Call:
> <<<
>> spot[1]
>    DATETIME
>
> 1 1251759600
> 2 1251759601
> 3 1251759602
> 4 1251759603
>>>>
>
>
> What I ideally wanted to see (whether or not the time part of the
> datetimestamp at midnight was displayed):
> <<<
>> spot[1]
>    DATETIME
>
> 01/09/2009 00:00:00
> 01/09/2009 00:00:01
> 01/09/2009 00:00:02
> 01/09/2009 00:00:03
> 01/09/2009 00:00:04
>>>>
>
>
> For the function as defined above using 'sapply'
>> spot[,1]
>         01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
> 00:00:03
>         1251759600          1251759601          1251759602
> 1251759603
>
> This was unexpected - it seems to have displayed the datetimestamp  
> values
> both as per my defined character-string representation and as numeric
> values.

as.POSIXct(spot$DATETIME,  origin="1970-01-01")
                01/09/2009       01/09/2009 00:00:01       01/09/2009  
00:00:02
"2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01  
05:00:02 EDT"
       01/09/2009 00:00:03
"2009-09-01 05:00:03 EDT"

If you want to get rid of the somewhat extranous names:

 > unname(as.POSIXct(spot$DATETIME,  origin="1970-01-01") )
[1] "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01  
05:00:02 EDT"
[4] "2009-09-01 05:00:03 EDT"

If you want a varialbe that stays that way:

 > spot$D2 <- as.POSIXct(spot$DATETIME,  origin="1970-01-01")
 > spot
     DATETIME   FREQ                  D2
1 1251777600 59.036 2009-09-01 05:00:00
2 1251777601 58.035 2009-09-01 05:00:01
3 1251777602 53.035 2009-09-01 05:00:02
4 1251777603 47.033 2009-09-01 05:00:03

Or you could overwrite spot$DATETIME.


>
> Alternatively ifI replace the 'sapply' by a 'lapply' then I get  
> something
> closer to what I expect.  It is at least what looks like R's default  
> text
> representation for POSIXct datetimes, even if it is not in my  
> preferred
> format.
> <<<
>> spot[,1]
>
> [[1]]
> [1] "2009-09-01 BST"
>
> [[2]]
> [1] "2009-09-01 00:00:01 BST"
>
> [[3]]
> [1] "2009-09-01 00:00:02 BST"
>
> [[4]]
> [1] "2009-09-01 00:00:03 BST"
>>>>
>
> -- 


David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list