[R] Date-Time-Stamp input method for user-specific formats

Don MacQueen macq at llnl.gov
Tue Oct 6 00:18:49 CEST 2009


Off the top of my head, I think you're working to hard at this.

I would read in the timestamp  column as a character string. Then, 
find those where the string length is too short [using nchar()], 
append "00:00:00" to those [using paste()], and then convert to 
POSIXt [using as.POSIXct()].

No need to define new classes. Simple and easy to understand.

-Don

At 2:14 PM -0700 10/5/09, esp wrote:
>Date-Time-Stamp input method to correctly interpret user-specific
>formats:coding is  90% there - based on exmple at
>http://*tolstoy.newcastle.edu.au/R/help/05/02/12003.html
>...anyone got the last 10% please? 
>
>CONTEXT:
>
>Data is received where one of the columns is a datetimestamp.  At midnight,
>the value represented as text in this column consists of just the date part,
>e.g. "01/09/2009".  At other times, the value in the column contains both
>date and time e.g. "01/09/2009 00:00:01".  The goal is to read it into R as
>an appropriate data type, where for example date arithmetic can be
>performed.  As far as I can tell, the most appropriate such data type is
>POSIXct.  The trick then is to read in the datetimestamps in the data as
>this type.
>
>PROBLEM:
>
>POSIXct defaults to a text representation almost but not quite like my
>received data.  The main difference is that the POSIXct date part is in
>reverse order, e.g. "2009-09-01".  It is possible to define a different
>format where date and time parts look like my data but when encountering
>datetimestamps where only the the date part is present (as in the case of my
>midnight data) then this is interpreted as NA i.e. undefined.
>
>SOLUTION (ALMOST):
>
>There is a workaround (based on example at
>http://*tolstoy.newcastle.edu.au/R/help/05/02/12003.html).  It is possible to
>define a class then read the data in as this class.  For such a class it is
>possible to define a class method, in terms of a function, for translating a
>text (character string) representation into a value. In that function, one
>can use a conditional expression to treat midnight datetimestamps
>differently from those at other times of day.  The example below does that.
>In order to apply this function over all of the datetimestamp values in the
>column, it is necessary to use something like R's 'sapply' function.
>
>SNAG:
>
>The function below implements this approach.  A datetimestamp with only the
>date part, including leading zeroes, is always length 10 (characters).   It
>correctly interprets the datetimestamp values, but unfortunately translates
>them into what appear to be numeric type.  I am actually uncertain precisely
>what is happening, as I am very new to R and have most certainly stretched
>myself in writing this code.  I think perhaps it returns a list and
>something associated with this aspect makes it "forget" the data type is
>POSIXct or at least how such a type should be displayed as text or what to
>do about it.
>
>PLEA:
>
>Please, can anyone give any help whatsoever, however tenuous?
>
>CODE, DATA & RESULTS:
>
>Function to Read required data, intended to make the datetime column of the
>data (example given further below) into POSIXct values:
><<<
>spot_frequency_readin <- function(file,nrows=-1) {
>
># create temp class
>setClass("t_class2_", representation("character"))
>setAs("character", "t_class2_", function(from) {sapply(from, function(x) {
>   if (nchar(x)==10) {
>as.POSIXct(strptime(x,format="%d/%m/%Y"))
>}
>else {
>as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
>}
>}
>)
>}
>)
>
>#(for format symbols, see "R Reference Card")
>
># read the file (TSV)
>file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
>as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
>"numeric") )
>
># remove it now that we are done with it
>removeClass("t_class2_")
>
>return(file)
>}
>>>>
>This appears to work apart as regards processing each row of data correctly,
>but the values returned look like numeric equivalents of POSIXct, as opposed
>to the expected character-based (string) equivalents:
>
>
>Example Data:
><<<
>DATETIME	FREQ
>01/09/2009	59.036
>01/09/2009 00:00:01	58.035
>01/09/2009 00:00:02	53.035
>01/09/2009 00:00:03	47.033
>01/09/2009 00:00:04	52.03
>01/09/2009 00:00:05	55.025
>>>>
>
>
>Example Function Call:
><<<
>>  spot = spot_frequency_readin("mydatafile.txt",4)
>>>>
>
>
>Result of Example Function Call:
><<<
>>  spot[1]
>     DATETIME
>
>1 1251759600
>2 1251759601
>3 1251759602
>4 1251759603
>>>>
>
>
>What I ideally wanted to see (whether or not the time part of the
>datetimestamp at midnight was displayed):
><<<
>>  spot[1]
>     DATETIME
>
>01/09/2009 00:00:00
>01/09/2009 00:00:01
>01/09/2009 00:00:02
>01/09/2009 00:00:03
>01/09/2009 00:00:04
>>>>
>
>
>For the function as defined above using 'sapply'
>>  spot[,1]
>          01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
>00:00:03
>          1251759600          1251759601          1251759602        
>1251759603
>
>This was unexpected - it seems to have displayed the datetimestamp values
>both as per my defined character-string representation and as numeric
>values. 
>
>Alternatively ifI replace the 'sapply' by a 'lapply' then I get something
>closer to what I expect.  It is at least what looks like R's default text
>representation for POSIXct datetimes, even if it is not in my preferred
>format.
><<<
>>  spot[,1]
>
>[[1]]
>[1] "2009-09-01 BST"
>
>[[2]]
>[1] "2009-09-01 00:00:01 BST"
>
>[[3]]
>[1] "2009-09-01 00:00:02 BST"
>
>[[4]]
>[1] "2009-09-01 00:00:03 BST"
>>>>
>
>--
>View this message in context: 
>http://*www.*nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062




More information about the R-help mailing list