[R] Date-Time-Stamp input method for user-specific formats
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Oct 6 14:20:24 CEST 2009
See below.
On Mon, Oct 5, 2009 at 6:50 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Try this. First we read a line at a time into L except for the
> header. Then we use strapply to match on the given pattern. It
> passes the backreferences (the portions within parentheses in the
> pattern) to the function (defined via a formula) whose implicit
> arguments are x, y and z. That function returns two columns which are
> in the required form so that in the next statement we convert one to
> chron and the other to numeric. See R News 4/1 for more about dates
> and times.
>
> library(gsubfn) # strapply
> library(chron) # as.chron
> Lines <- "DATETIME FREQ
> 01/09/2009 59.036
> 01/09/2009 00:00:01 58.035
> 01/09/2009 00:00:02 53.035
> 01/09/2009 00:00:03 47.033
> 01/09/2009 00:00:04 52.03
> 01/09/2009 00:00:05 55.025"
> L <- readLines(Lines)[-1]
This line should be L <- readLines(textConnection(Lines))[-1]
>
> pat <- "(../../....) (..:..:..){0,1} *([0-9.]+)"
> s <- strapply(L, pat, ~ c(paste(x, y, "00:00:00"), z), simplify = rbind)
>
> fmt <- "%m/%d/%Y %H:%M:%S"
> DF <- data.frame(Time = as.chron(s[,1], fmt), Freq = as.numeric(s[,2]))
>
> DF
>
> The final output looks like this:
>
>> DF
> Time Freq
> 1 (01/09/09 00:00:00) 59.036
> 2 (01/09/09 00:00:01) 58.035
> 3 (01/09/09 00:00:02) 53.035
> 4 (01/09/09 00:00:03) 47.033
> 5 (01/09/09 00:00:04) 52.030
> 6 (01/09/09 00:00:05) 55.025
>
> If the times are unique you could consider making a zoo object out of
> it by replacing the DF<- statement with:
>
> library(zoo)
> z <- zoo(as.numeric(s[,2]), as.chron(s[,1], fmt))
>
> See the three vignettes in the zoo package.
>
>
> On Mon, Oct 5, 2009 at 5:14 PM, esp <davidgaryesp at gmail.com> wrote:
>>
>> Date-Time-Stamp input method to correctly interpret user-specific
>> formats:coding is 90% there - based on exmple at
>> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
>> ...anyone got the last 10% please?
>>
>> CONTEXT:
>>
>> Data is received where one of the columns is a datetimestamp. At midnight,
>> the value represented as text in this column consists of just the date part,
>> e.g. "01/09/2009". At other times, the value in the column contains both
>> date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as
>> an appropriate data type, where for example date arithmetic can be
>> performed. As far as I can tell, the most appropriate such data type is
>> POSIXct. The trick then is to read in the datetimestamps in the data as
>> this type.
>>
>> PROBLEM:
>>
>> POSIXct defaults to a text representation almost but not quite like my
>> received data. The main difference is that the POSIXct date part is in
>> reverse order, e.g. "2009-09-01". It is possible to define a different
>> format where date and time parts look like my data but when encountering
>> datetimestamps where only the the date part is present (as in the case of my
>> midnight data) then this is interpreted as NA i.e. undefined.
>>
>> SOLUTION (ALMOST):
>>
>> There is a workaround (based on example at
>> http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to
>> define a class then read the data in as this class. For such a class it is
>> possible to define a class method, in terms of a function, for translating a
>> text (character string) representation into a value. In that function, one
>> can use a conditional expression to treat midnight datetimestamps
>> differently from those at other times of day. The example below does that.
>> In order to apply this function over all of the datetimestamp values in the
>> column, it is necessary to use something like R's 'sapply' function.
>>
>> SNAG:
>>
>> The function below implements this approach. A datetimestamp with only the
>> date part, including leading zeroes, is always length 10 (characters). It
>> correctly interprets the datetimestamp values, but unfortunately translates
>> them into what appear to be numeric type. I am actually uncertain precisely
>> what is happening, as I am very new to R and have most certainly stretched
>> myself in writing this code. I think perhaps it returns a list and
>> something associated with this aspect makes it "forget" the data type is
>> POSIXct or at least how such a type should be displayed as text or what to
>> do about it.
>>
>> PLEA:
>>
>> Please, can anyone give any help whatsoever, however tenuous?
>>
>> CODE, DATA & RESULTS:
>>
>> Function to Read required data, intended to make the datetime column of the
>> data (example given further below) into POSIXct values:
>> <<<
>> spot_frequency_readin <- function(file,nrows=-1) {
>>
>> # create temp class
>> setClass("t_class2_", representation("character"))
>> setAs("character", "t_class2_", function(from) {sapply(from, function(x) {
>> if (nchar(x)==10) {
>> as.POSIXct(strptime(x,format="%d/%m/%Y"))
>> }
>> else {
>> as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
>> }
>> }
>> )
>> }
>> )
>>
>> #(for format symbols, see "R Reference Card")
>>
>> # read the file (TSV)
>> file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
>> as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
>> "numeric") )
>>
>> # remove it now that we are done with it
>> removeClass("t_class2_")
>>
>> return(file)
>> }
>>>>>
>> This appears to work apart as regards processing each row of data correctly,
>> but the values returned look like numeric equivalents of POSIXct, as opposed
>> to the expected character-based (string) equivalents:
>>
>>
>> Example Data:
>> <<<
>> DATETIME FREQ
>> 01/09/2009 59.036
>> 01/09/2009 00:00:01 58.035
>> 01/09/2009 00:00:02 53.035
>> 01/09/2009 00:00:03 47.033
>> 01/09/2009 00:00:04 52.03
>> 01/09/2009 00:00:05 55.025
>>>>>
>>
>>
>> Example Function Call:
>> <<<
>>> spot = spot_frequency_readin("mydatafile.txt",4)
>>>>>
>>
>>
>> Result of Example Function Call:
>> <<<
>>> spot[1]
>> DATETIME
>>
>> 1 1251759600
>> 2 1251759601
>> 3 1251759602
>> 4 1251759603
>>>>>
>>
>>
>> What I ideally wanted to see (whether or not the time part of the
>> datetimestamp at midnight was displayed):
>> <<<
>>> spot[1]
>> DATETIME
>>
>> 01/09/2009 00:00:00
>> 01/09/2009 00:00:01
>> 01/09/2009 00:00:02
>> 01/09/2009 00:00:03
>> 01/09/2009 00:00:04
>>>>>
>>
>>
>> For the function as defined above using 'sapply'
>>> spot[,1]
>> 01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
>> 00:00:03
>> 1251759600 1251759601 1251759602
>> 1251759603
>>
>> This was unexpected - it seems to have displayed the datetimestamp values
>> both as per my defined character-string representation and as numeric
>> values.
>>
>> Alternatively ifI replace the 'sapply' by a 'lapply' then I get something
>> closer to what I expect. It is at least what looks like R's default text
>> representation for POSIXct datetimes, even if it is not in my preferred
>> format.
>> <<<
>>> spot[,1]
>>
>> [[1]]
>> [1] "2009-09-01 BST"
>>
>> [[2]]
>> [1] "2009-09-01 00:00:01 BST"
>>
>> [[3]]
>> [1] "2009-09-01 00:00:02 BST"
>>
>> [[4]]
>> [1] "2009-09-01 00:00:03 BST"
>>>>>
>>
>> --
>> View this message in context: http://www.nabble.com/Date-Time-Stamp-input-method-for-user-specific-formats-tp25757018p25757018.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list