[Rd] Another issue with Sys.timezone

Martin Maechler maechler at stat.math.ethz.ch
Fri Oct 20 09:15:42 CEST 2017


>>>>> Stephen Berman <stephen.berman at gmx.net>
>>>>>     on Thu, 19 Oct 2017 17:12:50 +0200 writes:

    > On Wed, 18 Oct 2017 18:09:41 +0200 Martin Maechler <maechler at stat.math.ethz.ch> wrote:
    >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
    >>>>>>> on Mon, 16 Oct 2017 19:13:31 +0200 writes:

    > (I also included a reply to part of this response of yours below.)

    >>>>>>> Stephen Berman <stephen.berman at gmx.net>
    >>>>>>> on Sun, 15 Oct 2017 01:53:12 +0200 writes:
    >> 
    >>> > (I reported the test failure mentioned below to R-help but was advised
    >>> > that this list is the right one to address the issue; in the meantime I
    >>> > investigated the matter somewhat more closely, including searching
    >>> > recent R-devel postings, since I haven't been following this list.)
    >>> 
    >>> > Last May there were two reports here of problems with Sys.timezone, one
    >>> > where the zoneinfo directory is in a nonstandard location
    >>> > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
    >>> > other where the system lacks the file /etc/localtime
    >>> > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
    >>> > system exhibits a third case: it lacks /etc/timezone and does not set TZ
    >>> > systemwide, but it does have /etc/localtime, which is a copy of, rather
    >>> > than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
    >>> > returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
    >>> > my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
    >>> > e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
    >>> > this, e.g. as below, it would be useful on such systems as mine and the
    >>> > regression test would pass.
    >>> 
    >>> > my.Sys.timezone <- 
    >>> > 	function (location = TRUE) 
    >>> > {
    >>> > 	tz <- Sys.getenv("TZ", names = FALSE)
    >>> > 	if (!location || nzchar(tz)) 
    >>> > 	    return(Sys.getenv("TZ", unset = NA_character_))
    >>> > 	lt <- normalizePath("/etc/localtime")
    >>> > 	if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
    >>> > 	    grepl(pat <- "^/usr/share/zoneinfo.default/", lt)) 
    >>> > 	    sub(pat, "", lt)
    >>> > 	else if (lt == "/etc/localtime")
    >>> > 	    if (!file.exists("/etc/timezone"))
    >>> > 		return(as.POSIXlt(Sys.time())$zone)
    >>> > 	    else if (dir.exists("/usr/share/zoneinfo") && {
    >>> > 		info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
    >>> > 		(!info$isdir && info$size <= 200L)
    >>> > 	    } && {
    >>> > 		tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L), 
    >>> > 				error = function(e) raw(0L))
    >>> > 		length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
    >>> > 	    } && {
    >>> > 		tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
    >>> > 		tzp <- file.path("/usr/share/zoneinfo", tz2)
    >>> > 		file.exists(tzp) && !dir.exists(tzp) &&
    >>> > 		    identical(file.size(normalizePath(tzp)), file.size(lt))
    >>> > 	    }) 
    >>> > 		tz2
    >>> > 	    else NA_character_
    >>> > }
    >>> 
    >>> > One problem with this is that the zone component of as.POSIXlt only
    >>> > holds the abbreviated timezone, not the Olson name.  
    >>> 
    >>> Yes, indeed.  So, really only for  Sys.timezone(location = FALSE)  this
    >>> should be given, for the default  location = TRUE   it should
    >>> still give NA (i.e. NA_character_)  in your setup.
    >>> 
    >>> Interestingly, the Windows versions of Sys.timezone(location =
    >>> FALSE) uses something like your proposal,  and I tend to think that
    >>> -- again only for location=FALSE -- this should be used on
    >>> on-Windows as well, at least instead of returning  NA  then.
    >>> 
    >>> Also for me on 3 different Linuxen (Fedora 24, F. 26, and ubuntu
    >>> 14.04 LTS), I get
    >>> 
    >>> > Sys.timezone()
    >>> [1] "Europe/Zurich"
    >>> > Sys.timezone(FALSE)
    >>> [1] NA
    >>> > 
    >>> 
    >>> whereas on Windows I get Europe/Berlin for the first (why on
    >>> earth - I'm really in Zurich) and get  "CEST" ("Central European Summer Time") 
    >>> for the 2nd one instead of NA ... simply using a smarter version
    >>> of your proposal.   The windows source is
    >>> in R's source at  src/library/base/R/windows/system.R :
    >>> 
    >>> Sys.timezone <- function(location = TRUE)
    >>> {
    >>> tz <- Sys.getenv("TZ", names = FALSE)
    >>> if(nzchar(tz)) return(tz)
    >>> if(location) return(.Internal(tzone_name()))
    >>> z <- as.POSIXlt(Sys.time())
    >>> zz <- attr(z, "tzone")
    >>> if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
    >>> }
    >>> 
    >>> >From what I read, the last three lines also work in your setup
    >>> where it seems zz would be of length 1, right ?

    > Those line do indeed work here, but zz has three elements:

    >> attributes(as.POSIXlt(Sys.time()))$tzone
    > [1] ""     "CET"  "CEST"

{ "but" ??   yes, three elements is what I see too, but for that
  reason there's the  if(length(zz) == 3L) ... }

    >>> I'd really propose to use these 3 lines in the non-Windows
    >>> version of Sys.timezone .. at the end *instead* of NA_character_
    >>> (or a slightly safer version which gives  NA_character_ if zz is
    >>> of length 0 {e.g. if there is no "tzone" attribute}.
    >>> 
    >>> > i don't know how to
    >>> > get the Olson name using only R functions, but maybe it would be good
    >>> > enough to return the abbreviated timezone where possible, e.g. as above.
    >>> > (On my system I can get the Olson name of the timezone in R with a shell
    >>> > pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
    >>> > | grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
    >>> > '/' -f 5,6"), but the last part of this is tailored to my configuration
    >>> > and the whole thing is not OS-neutral, so it isn't suitable for
    >>> > Sys.timezone.)
    >>> 
    >>> > Steve Berman
    >>> 
    >>> Definitely not.  I still recommend you think of a more portable
    >>> solution for the   `location = TRUE` (default) case in Sys.timezone().
    >>> Returning the non-location form (e.g "CEST") when something like
    >>> "Europe/Zurich" is expected is really not a good idea,
    >>> and you are lucky that the regression test passes "accidentally" ...
    >>> 
    >>> Martin
    >> 
    >> In the mean time, I have committed a common version (Windows and
    >> non-Windows) of  Sys.timezone()  to the R development sources
    >> (aka "R-devel").
    >> 
    >> That now uses  as.POSIXlt(Sys.time())  very similarly to the
    >> above "Windows only" case,  but __only__ for  'location=FALSE'
    >> which is not the default.

    > Thanks, I think that's definitely better than returning NA when
    > `location' is false...

    >> The most current development source is always available (via
    >> 'svn' or alternatively for browsing via your web browser) from
    >> 
    >> https://svn.r-project.org/R/trunk/src/library/base/R/datetime.R

    > ...however, I tried the test that failed for me during `make check' now
    > with this new definition of Sys.timezone() by pasting the definition (as
    > new.Sys.timezone()) and the two lines of the test code into the R console,
    > and this is what happened:

    >> new.Sys.timezone()
    >> new.Sys.timezone(FALSE)
    > [1] "CEST"
    >> (S.t <- new.Sys.timezone())
    > NULL
    >> if(is.na(S.t) || !nzchar(S.t)) stop("could not get timezone")
    > Error in if (is.na(S.t) || !nzchar(S.t)) stop("could not get timezone") : 
    > missing value where TRUE/FALSE needed
    > In addition: Warning message:
    > In is.na(S.t) : is.na() applied to non-(list or vector) of type 'NULL'

    > This is because `location' is true but all the if-clauses in the body
    > following `if(location)' are false, so it returns NULL.  If you add the
    > line `else NA_character_' below the line `tz2', then NA is returned and
    > the test fails as before instead of as above.

Thank you,  for the perfect diagnosis.  Embarrassingly I had
dropped this else-clause accidentally.

    >> As you say yourself, the above system("... xargs md5sum ...")
    >> using workaround is really too platform specific  but I'd guess
    >> there should be a less error prone way to get the long timezone
    >> name on your system ...

    > If I understand the zic(8) man page, the files in /usr/share/zoneinfo
    > should contain this information, but I don't know how to extract it,
    > since these are compiled files.  And since on my system /etc/localtime
    > is a copy of one of these compiled files, I don't know of any other way
    > to recover the location name without comparing it to those files.

    >> If that remains "contained" (i.e. small) and works with files
    >> and R's files tools -- e.g. file.*() ones [but not system()],
    >> I'd consider a patch to the above source file
    >> (sent by you to the R-devel mailing list --- or after having
    >> gotten an account there by asking, via bug report & patch
    >> attachment at https://bugs.r-project.org/ )

    > If comparing file size sufficed, that would be easy to do in R;
    > unfortunately, it is not sufficient, since some files designating
    > different time zones in /usr/share/zoneinfo do have the same size.  So
    > the only alternative I can think of is to compare bytes, e.g. with
    > md5sum or with cmp.  Is there some way to do this in R without using
    > system()?

Can't you use
      tz1 <- readBin("/etc/localtime", "raw", 200L)
plus later
      tz2 <- gsub(.......,  rawToChar(tz1))

on your  /etc/localtime file 
almost identically as the current code does for "/etc/timezone" ?

Martin



More information about the R-devel mailing list