[Rd] Suggestions for improvement as regards `as` methods, and a call for consistency in `as.Date` methods
Michael Chirico
michaelchirico4 at gmail.com
Wed Jan 27 04:12:21 CET 2016
Good evening all,
This topic is gone into at a bit more length at my related Stack Overflow
question here:
http://stackoverflow.com/questions/34647674/why-do-as-methods-remove-vector-names-and-is-there-a-way-around-it
There are two lingering issues despite the abundant insight received at SO,
namely:
1) _Why_ do as methods remove their arguments' names attribute?
This is a fact which is mentioned briefly in a select few of the related
help files, namely ?as.vector ("removes *all* attributes including names
for results of atomic mode"), ?as.double ("strips attributes including
names.") and ?as.character ("strips attributes including names"); however,
it appears (1) neither of these references gives a satisfactory explanation
of the reasoning behind this (I can only think of speed) and (2) it would
be much more digestible to users if this information (even copy-pasting the
same blurb) was placed in all of the as reference files (e.g., ?as,
?as.numeric, ?as.Date, ?as.POSIXct, etc.)
Personally, I think that unless there's a substantial efficiency cost to
doing so, the default should in fact be to retain names (if not other
attributes).
2) All as.Date methods should behave consistently as regards attribute
retention
As explicated in the referenced SO topic, the following should all give the
same result (as they would for similar examples involving other as
methods), but don't:
datesc <- c(ind = "2015-07-04", nyd = "2016-01-01")
datesn <- c(ind = 16620, nyd = 16801)
datesp <- structure(c(1435982400, 1451624400), .Names = c("ind", "nyd"),
class = c("POSIXct", "POSIXt"), tzone = "")
datesl <- structure(list(sec = c(0, 0), min = c(0L, 0L),
hour = c(0L, 0L), mday = c(4L, 1L),
mon = c(6L, 0L),
year = structure(115:116, .Names = c("ind",
"nyd")),
wday = c(6L, 5L), yday = c(184L, 0L),
isdst = c(1L, 0L), zone = c("EDT", "EST"),
gmtoff = c(NA_integer_, NA_integer_)),
.Names = c("sec", "min", "hour", "mday",
"mon", "year", "wday", "yday",
"isdst", "zone", "gmtoff"),
class = c("POSIXlt", "POSIXt"))
Retain names
as.Date.numeric(datesn)
# ind nyd
#"2015-07-04" "2016-01-01"
as.Date.POSIXct(datesp)
# ind nyd
#"2015-07-04" "2016-01-01"
Destroy names
as.Date.POSIXlt(datesl)
# [1] "2015-07-04" "2016-01-01"
as.Date.character(datesc)
# [1] "2015-07-04" "2016-01-01"
(unconfirmed, but I assume given a glance at the code that all of
as.Date.date, as.Date.dates, as.Date.ts, as.Date.yearmon, and
as.Date.yearqtr will also strip the names)
Regardless of the default behavior as regards keeping/destroying
names/other attributes, it would seem that for the sake of consistency the
above should be unified.
Barring an overhaul of all as methods to retain names, this would mean the
following changes (for example):
as.Date.numeric <- function (x, origin, ...)
{
if (missing(origin))
origin <- "1970-01-01"
if (identical(origin, "0000-00-00"))
origin <- as.Date("0000-01-01", ...) - 1
setNames(as.Date(origin, ...) + x, NULL)
}
as.Date.POSIXct <- function (x, tz = "UTC", ...)
{
if (tz == "UTC") {
z <- floor(unclass(x)/86400)
attr(z, "tzone") <- NULL
attr(z, "names") <- NULL
structure(z, class = "Date")
}
else as.Date(as.POSIXlt(x, tz = tz))
}
Thank you in advance for your consideration and thank you as always for
your time on this project.
Michael Chirico
PhD Candidate in Economics
University of Pennsylvania
3718 Locust Walk
Room 160 McNeil Building
Philadelphia, PA 19104
United States of America
[[alternative HTML version deleted]]
More information about the R-devel
mailing list