[R] Date-Time-Stamp input method for user-specific formats
David Winsemius
dwinsemius at comcast.net
Tue Oct 6 00:24:40 CEST 2009
On Oct 5, 2009, at 5:14 PM, esp wrote:
>
> Date-Time-Stamp input method to correctly interpret user-specific
> formats:coding is 90% there - based on exmple at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
> ...anyone got the last 10% please?
>
> CONTEXT:
>
> Data is received where one of the columns is a datetimestamp. At
> midnight,
> the value represented as text in this column consists of just the
> date part,
> e.g. "01/09/2009". At other times, the value in the column contains
> both
> date and time e.g. "01/09/2009 00:00:01". The goal is to read it
> into R as
> an appropriate data type, where for example date arithmetic can be
> performed. As far as I can tell, the most appropriate such data
> type is
> POSIXct. The trick then is to read in the datetimestamps in the
> data as
> this type.
>
> PROBLEM:
>
> POSIXct defaults to a text representation almost but not quite like my
> received data. The main difference is that the POSIXct date part is
> in
> reverse order, e.g. "2009-09-01". It is possible to define a
> different
> format where date and time parts look like my data but when
> encountering
> datetimestamps where only the the date part is present (as in the
> case of my
> midnight data) then this is interpreted as NA i.e. undefined.
>
> SOLUTION (ALMOST):
>
> There is a workaround (based on example at
> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is
> possible to
> define a class then read the data in as this class. For such a
> class it is
> possible to define a class method, in terms of a function, for
> translating a
> text (character string) representation into a value. In that
> function, one
> can use a conditional expression to treat midnight datetimestamps
> differently from those at other times of day. The example below
> does that.
> In order to apply this function over all of the datetimestamp values
> in the
> column, it is necessary to use something like R's 'sapply' function.
>
> SNAG:
>
> The function below implements this approach. A datetimestamp with
> only the
> date part, including leading zeroes, is always length 10
> (characters). It
> correctly interprets the datetimestamp values, but unfortunately
> translates
> them into what appear to be numeric type. I am actually uncertain
> precisely
> what is happening, as I am very new to R and have most certainly
> stretched
> myself in writing this code. I think perhaps it returns a list and
> something associated with this aspect makes it "forget" the data
> type is
> POSIXct or at least how such a type should be displayed as text or
> what to
> do about it.
>
> PLEA:
>
> Please, can anyone give any help whatsoever, however tenuous?
>
> CODE, DATA & RESULTS:
>
> Function to Read required data, intended to make the datetime column
> of the
> data (example given further below) into POSIXct values:
> <<<
> spot_frequency_readin <- function(file,nrows=-1) {
>
> # create temp class
> setClass("t_class2_", representation("character"))
> setAs("character", "t_class2_", function(from) {sapply(from,
> function(x) {
> if (nchar(x)==10) {
> as.POSIXct(strptime(x,format="%d/%m/%Y"))
> }
> else {
> as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
> }
> }
> )
> }
> )
>
> #(for format symbols, see "R Reference Card")
>
> # read the file (TSV)
> file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
> as.is=FALSE, col.names=c("DATETIME", "FREQ"),
> colClasses=c("t_class2_",
> "numeric") )
>
> # remove it now that we are done with it
> removeClass("t_class2_")
>
> return(file)
> }
>>>>
> This appears to work apart as regards processing each row of data
> correctly,
> but the values returned look like numeric equivalents of POSIXct, as
> opposed
> to the expected character-based (string) equivalents:
>
>
> Example Data:
> <<<
> DATETIME FREQ
> 01/09/2009 59.036
> 01/09/2009 00:00:01 58.035
> 01/09/2009 00:00:02 53.035
> 01/09/2009 00:00:03 47.033
> 01/09/2009 00:00:04 52.03
> 01/09/2009 00:00:05 55.025
>>>>
>
>
> Example Function Call:
> <<<
>> spot = spot_frequency_readin("mydatafile.txt",4)
>>>>
>
>
> Result of Example Function Call:
> <<<
>> spot[1]
> DATETIME
>
> 1 1251759600
> 2 1251759601
> 3 1251759602
> 4 1251759603
>>>>
>
>
> What I ideally wanted to see (whether or not the time part of the
> datetimestamp at midnight was displayed):
> <<<
>> spot[1]
> DATETIME
>
> 01/09/2009 00:00:00
> 01/09/2009 00:00:01
> 01/09/2009 00:00:02
> 01/09/2009 00:00:03
> 01/09/2009 00:00:04
>>>>
>
>
> For the function as defined above using 'sapply'
>> spot[,1]
> 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
> 00:00:03
> 1251759600 1251759601 1251759602
> 1251759603
>
> This was unexpected - it seems to have displayed the datetimestamp
> values
> both as per my defined character-string representation and as numeric
> values.
as.POSIXct(spot$DATETIME, origin="1970-01-01")
01/09/2009 01/09/2009 00:00:01 01/09/2009
00:00:02
"2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01
05:00:02 EDT"
01/09/2009 00:00:03
"2009-09-01 05:00:03 EDT"
If you want to get rid of the somewhat extranous names:
> unname(as.POSIXct(spot$DATETIME, origin="1970-01-01") )
[1] "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01
05:00:02 EDT"
[4] "2009-09-01 05:00:03 EDT"
If you want a varialbe that stays that way:
> spot$D2 <- as.POSIXct(spot$DATETIME, origin="1970-01-01")
> spot
DATETIME FREQ D2
1 1251777600 59.036 2009-09-01 05:00:00
2 1251777601 58.035 2009-09-01 05:00:01
3 1251777602 53.035 2009-09-01 05:00:02
4 1251777603 47.033 2009-09-01 05:00:03
Or you could overwrite spot$DATETIME.
>
> Alternatively ifI replace the 'sapply' by a 'lapply' then I get
> something
> closer to what I expect. It is at least what looks like R's default
> text
> representation for POSIXct datetimes, even if it is not in my
> preferred
> format.
> <<<
>> spot[,1]
>
> [[1]]
> [1] "2009-09-01 BST"
>
> [[2]]
> [1] "2009-09-01 00:00:01 BST"
>
> [[3]]
> [1] "2009-09-01 00:00:02 BST"
>
> [[4]]
> [1] "2009-09-01 00:00:03 BST"
>>>>
>
> --
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list