[R] by inconsistently strips class - with fix
Alex Brown
fishtank at compsoc.man.ac.uk
Tue Apr 15 13:28:15 CEST 2008
summary:
The function 'by' inconsistently strips class from the data to which
it is applied.
quick reason:
tapply strips class when simplify is set to TRUE (the default) due to
the class stripping behaviour of unlist.
quick answer:
This can be fixed by invoking tapply with simplify=FALSE, or changing
tapply to use do.call(c instead of unlist
executable example:
mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2))
by(mytimes$date, mytimes$set, function(x)x)
INDICES: 1
[1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST"
----------------------------------------------------------------------------------------
INDICES: 2
[1] "2008-04-15 11:41:40 BST"
by(mytimes[1,]$date, mytimes[1,]$set, function(x)x)
INDICES: 1
[1] 1208256099
why this is a problem:
This is a problem when you are feeding the output of this by into a
function which expects the class to be maintained. I see this problem
when constructing
reason:
tapply strips class when simplify is set to TRUE (the default) due to
the behaviour of unlist:
"Where possible the list elements are coerced to a common mode during
the unlisting, and so the result often ends up as a character vector.
Vectors will be coerced to the highest type of the components in the
hierarchy NULL < raw < logical < integer < real < complex < character
< list < expression: pairlists are treated as lists."
solution:
This problem can be fixed in the function by.data.frame by modifying
the call to tapply in the function "by":
by.data.frame = function (data, INDICES, FUN, ...)
{
if (!is.list(INDICES)) {
IND <- vector("list", 1)
IND[[1]] <- INDICES
names(IND) <- deparse(substitute(INDICES))[1]
}
else IND <- INDICES
FUNx <- function(x) FUN(data[x, ], ...)
nd <- nrow(data)
<<<<
ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data)
====
ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)),
data)
>>>>
attr(ans, "call") <- match.call()
class(ans) <- "by"
ans
}
alternative solution:
the call in tapply to unlist(ans, recursive=F) can be replaced by
do.call(c,ans, recursive=F) to fix this issue, since c does not strip
class.
However, I haven't taken the time to work out if this will work in all
cases.
for example:
function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
{
FUN <- if (!is.null(FUN))
match.fun(FUN)
if (!is.list(INDEX))
INDEX <- list(INDEX)
nI <- length(INDEX)
namelist <- vector("list", nI)
names(namelist) <- names(INDEX)
extent <- integer(nI)
nx <- length(X)
one <- 1L
group <- rep.int(one, nx)
ngroup <- one
for (i in seq.int(INDEX)) {
index <- as.factor(INDEX[[i]])
if (length(index) != nx)
stop("arguments must have same length")
namelist[[i]] <- levels(index)
extent[i] <- nlevels(index)
group <- group + ngroup * (as.integer(index) - one)
ngroup <- ngroup * nlevels(index)
}
if (is.null(FUN))
return(group)
ans <- lapply(split(X, group), FUN, ...)
index <- as.integer(names(ans))
if (simplify && all(unlist(lapply(ans, length)) == 1)) {
ansmat <- array(dim = extent, dimnames = namelist)
<<<<
ans <- unlist(ans, recursive = FALSE)
====
ans <- do.call(c, ans, recursive = FALSE)
>>>>
}
else {
ansmat <- array(vector("list", prod(extent)), dim = extent,
dimnames = namelist)
}
if (length(index)) {
names(ans) <- NULL
ansmat[index] <- ans
}
ansmat
}
Alexander Brown
Principal Engineer
Transitive
Maybrook House, 40 Blackfriars Street, Manchester M3 2EG
Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44
(0)7980 708 221
www.transitive.com
* The leader in cross-platform virtualization
More information about the R-help
mailing list