[Rd] 'ordered' destroyed to 'factor'
Robert McGehee
rmcgehee at walleyetrading.net
Fri Jun 16 15:59:45 CEST 2017
Hi,
It's been my experience that when you combine or aggregate vectors of factors using a function, you should be prepared for surprises, as it's not obvious what the "right" way to combine factors is (ordered or not), especially if two vectors of factors have different levels or (if ordered) are ordered in a different way.
For instance, what would you expect to get from unlist() if each element of the list had different levels, or were both ordered, but in a different way, or if some elements of the list were factors and others were ordered factors?
> unlist(list(ordered(c("a","b")), ordered(c("b","a"))))
[1] ?
Honestly, my biggest surprise from your question was that unlist even returned a factor at all. For example, the c() function just converts factors to integers.
> c(ordered(c("a","b")), ordered(c("a","b")))
[1] 1 2 1 2
And here's one that's especially weird. When rbind() data frames with an ordered factor, you still get an ordered factor back, but the order may be different from either of the original orders:
> x1 <- data.frame(a=ordered(c("b","c")))
> x2 <- data.frame(a=ordered(c("a","b","c")))
> str(rbind(x1,x2)) # Note b < a
'data.frame': 5 obs. of 1 variable:
$ a: Ord.factor w/ 3 levels "b"<"c"<"a": 1 2 3 1 2
Should rbind just have returned an integer like c(), or returned a factor like unlist(), or should it kept the result as an ordered factor, but ordered the result in a different way? I have no idea.
So in short, IMO, there are definitely inconsistencies in how ordered/factors are handled across functions, but I think it would be hard to point to any single function and say it is wrong or needs to be changed. My best advice, is to just be careful when combining or aggregating factors.
--Robert
-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of "Jens Oehlschlägel"
Sent: Friday, June 16, 2017 9:04 AM
To: r-devel at r-project.org
Cc: jens.oehlschlaegel at truecluster.com
Subject: [Rd] 'ordered' destroyed to 'factor'
Dear all,
I don't know if you consider this a bug or feature, but it breaks reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels are equal. Here is a simple example:
o <- ordered(letters)
o[[1]]
lapply(o, min)[[1]] # ordered factor
unlist(lapply(o, min))[[1]] # no longer ordered
sapply(o, min)[[1]] # no longer ordered
Jens Oehlschlägel
P.S: The above examples are silly for simple reproduction. The current behavior broke my use-case which had a structure like this
# have some data
x <- 1:20
# apply some function to each element
somefunc <- function(x){
# do something and return an ordinal level
sample(o, 1)
}
x <- sapply(x, somefunc)
# get minimum result
min(x)
# Error in Summary.factor(c(2L, 26L), na.rm = FALSE) :
# ‘min’ not meaningful for factors
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.0
year 2017
month 04
day 21
svn rev 72570
language R
version.string R version 3.4.0 (2017-04-21)
nickname You Stupid Darkness
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list