[Rd] Possible bug in "unsplit" (PR#14084)
ivar.herfindal at bio.ntnu.no
ivar.herfindal at bio.ntnu.no
Wed Nov 25 15:30:12 CET 2009
Dear R-bug-people
I have encountered a problem with "unsplit", which I believe may be
caused by a bug in the function. However, unexpericend with bug-reports
I apologise if this is barely a user problem rather than a problem
within R.
The problem occurs if an object is split by several grouping factors
with levels not occuring in the data, and using drop = TRUE. This may
appear as a special and hardly relevant case, but I had to split a data
frame on several factors, do some analyses on each of the subsets in the
splitted object, and then unsplit it. I had to use drop = TRUE,
otherwise my analyses would not run. Nevertheless, I found a fix to the
unsplit, which I suggest is due to that the drop-argument not is
maintained in the call to unsplit within unsplit. Description and
example below. The problem was found on R version 2.9.0 and 2.10.0 on
windows XP.
> sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32
locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252 LC_CTYPE=Norwegian
(Bokmål)_Norway.1252
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.10.0
>
## a reproducable example:
dff <- data.frame(gr1=factor(c(1,1,1,1,1,2,2,2,2,2,2),
levels=c(1,2,3,4)), gr2=factor(c(1,2,1,2,1,2,1,2,1,2,3),
levels=c(1,2,3,4)), yy=rnorm(11))
# note that the two groups "gr1" and "gr2" have defined levels which not
occur in the data.
dff2 <- split(dff, list(dff$gr1, dff$gr2), drop=TRUE)
# I dont want empty objects, so I use drop=TRUE
# now I want to unsplit it, and expect the following to work:
dff3 <- unsplit(dff2, list(dff$gr1, dff$gr2), drop=TRUE)
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "11", "3",
"11", :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': 1, 11, 3, 5
### end
Looking at the unsplit function, we find:
> unsplit
function (value, f, drop = FALSE)
{
len <- length(if (is.list(f)) f[[1L]] else f)
if (is.data.frame(value[[1L]])) {
x <- value[[1L]][rep(NA, len), , drop = FALSE]
rownames(x) <- unsplit(lapply(value, rownames), f)
}
else x <- value[[1L]][rep(NA, len)]
split(x, f, drop = drop) <- value
x
}
<environment: namespace:base>
>
Note that if "value" is a data.frame, then rownames for the output x is
made by the call:
rownames(x) <- unsplit(lapply(value, rownames), f)
This call to unsplit ignores the drop-argument, and in the example above
we get from this call:
> unsplit(lapply(dff2, rownames), list(dff$gr1, dff$gr2))
[1] "1" "11" "3" "11" "5" "1" "7" "3" "9" "5" "11"
i.e. not unique row names for the output x.
A simple fix is to add drop = drop to that argument, such that the
updated unsplit (here called unsplit2) is like this:
unsplit2 <- function (value, f, drop = FALSE)
{
len <- length(if (is.list(f)) f[[1L]] else f)
if (is.data.frame(value[[1L]])) {
x <- value[[1L]][rep(NA, len), , drop = FALSE]
rownames(x) <- unsplit(lapply(value, rownames), f, drop=drop) # note new
"drop=drop"
}
else x <- value[[1L]][rep(NA, len)]
split(x, f, drop = drop) <- value
x
}
This works fine in the example above, and the original levels in gr1 and
gr2 (i.e. they both have four levels) are maintained in the output data
frame such that it has similar attributes as the orignial dff:
> dff3 <- unsplit2(dff2, list(dff$gr1, dff$gr2), drop=TRUE)
> dff3
gr1 gr2 yy
1 1 1 2.13749771
2 1 2 -0.02166458
3 1 1 0.45960452
4 1 2 2.72074958
5 1 1 -0.17536995
6 2 2 -0.08909495
7 2 1 0.94260802
8 2 2 -0.09979505
9 2 1 1.22240834
10 2 2 -0.81710781
11 2 3 0.76071130
>
I must admit that I have not the possiblity to check if such a quick-fix
conflicts with other use of unsplit or on other types of data, but I
cannot see that it should be a problem.
Sincerely
Ivar Herfindal
--------------------------------
Centre for Conservation Biology
Norwegian University for Science and Technology
N-7491 Trondheim, Norway
email: ivar.herfindal at bio.ntnu.no
More information about the R-devel
mailing list