[R] Time and db precision

Thu May 26 17:04:24 CEST 2011

Thanks Marc,

I had just come up with another, slightly more convoluted solution. Add as.is = TRUE to the query and then get the timetoken with
  timetoken <- df$timestamp[df$timestamp == max(as.POSIX(df$timestamp))]

While it looks like options(digits.secs = 6) works, I worry that theoretically it just pushes the problem down to another level of decimal points. With the solution above, I apparently get the exact same value that was in the database.

Interestingly, 
  timetoken <- max(as.POSIX(df$timestamp))
does not appear to give me the same result.

Demo:
> a <- "2011-05-25 22:15:11.027116000"
> b <- "2011-05-25 22:15:11.027117000"
> c <- "2011-05-25 22:15:11.027118000"
> d <- c(a, b, c)
> d
[1] "2011-05-25 22:15:11.027116000" "2011-05-25 22:15:11.027117000" "2011-05-25 22:15:11.027118000"
> d[d == max(as.POSIXct(d))]
[1] "2011-05-25 22:15:11.027118000"
> max(as.POSIXct(d))
[1] "2011-05-25 22:15:11 COT"

--- On Thu, 5/26/11, Marc Schwartz <marc_schwartz at me.com> wrote:

> From: Marc Schwartz <marc_schwartz at me.com>
> Subject: Re: [R] Time and db precision
> To: "Mikkel Grum" <mi2kelgrum at yahoo.com>
> Cc: "R Help" <r-help at r-project.org>
> Date: Thursday, May 26, 2011, 8:22 AM
> 
> On May 25, 2011, at 6:25 PM, Mikkel Grum wrote:
> 
> > I have a loop that regularly checks for new data to
> analyse in my database. In order to facilitate this, the
> database table has a timestamp column with the time that the
> data was inserted into the database. Something like this:
> > 
> > while (....) {
> >    load(timetoken.Rdata)
> >    df <- sqlQuery(con, paste("SELECT *
> FROM tabledf WHERE timestamp > ", timetoken, sep = ""))
> >    analyse(df)
> >    timetoken <- max(df$timestamp)
> >    save(timetoken, file =
> "timetoken.Rdata")
> >    Sys.sleep(60)
> > }
> > 
> > Now this used to work fairly well with R and
> PostgreSQL on Windows, but on Ubuntu, I'm getting a lot of
> data being pulled up again and again. I suspect what is
> happening is that R is rounding off to the nearest second,
> while PostgreSQL is using a much higher level of precision,
> so that if no new data has come in in the meantime, chances
> are fairly high (50% ??) that the PostgreSQL timestamp is
> higher than the version that has been rounded off by R.
> > 
> > Is there any way of recording the timestamp in R
> exactly as it is in PostgreSQL? Or is there another way of
> dealing with this??
> > 
> > sessionInfo()
> > R version 2.11.1 (2010-05-31)
> > x86_64-pc-linux-gnu
> > 
> > locale:
> > [1] C
> > 
> > attached base packages:
> > [1] stats     graphics 
> grDevices utils     datasets 
> methods   base
> > 
> > other attached packages:
> > [1] RODBC_1.3-1
> > 
> > All assistance greatly appreciated.
> > 
> > Mikkel
> 
> 
> This query is better suited for R-SIG-DB:
> 
>   https://stat.ethz.ch/mailman/listinfo/r-sig-db
> 
> That being said:
> 
> See ?POSIXct
> 
> Check the actual output of paste("SELECT * FROM tabledf
> WHERE timestamp > ", timetoken, sep = "") to see what
> value 'timetoken' is actually taking as it is used in the
> query construct. As is noted in the above help file, be sure
> that options("digits.secs") is properly set, since the
> default will be to round printed output to the nearest
> second:
> 
> # A clean R session on OSX
> > options("digits.secs")
> $digits.secs
> NULL
> 
> # return current date/time as POSIXct
> 
> > Sys.time()
> [1] "2011-05-26 08:11:37 CDT"
> 
> options(digits.secs = 6)
> 
> > Sys.time()
> [1] "2011-05-26 08:12:07.080329 CDT"
> 
> 
> options(digits.secs = 0)
> 
> > paste("SELECT * FROM tabledf WHERE timestamp > ",
> Sys.time(), sep = "")
> [1] "SELECT * FROM tabledf WHERE timestamp > 2011-05-26
> 08:15:02"
>  
> options(digits.secs = 6)
> 
> > paste("SELECT * FROM tabledf WHERE timestamp > ",
> Sys.time(), sep = "")
> [1] "SELECT * FROM tabledf WHERE timestamp > 2011-05-26
> 08:15:12.005103"
> 
> 
> HTH,
> 
> Marc Schwartz
> 
>