[Rd] Suggestion on default 'levels' in 'factor'
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Fri May 6 10:05:26 CEST 2016
At first read, the logic of the following fragment in code of function 'factor' was not clear to me.
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y) # or possibly order(x) which is more (too ?) tolerant
y <- as.character(y)
levels <- unique(y[ind])
}
Code similar to the originally proposed in https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to me.
I suggest using this.
if (missing(levels))
levels <- unique(as.character(
sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly sort(x) which is more (too ?) tolerant
))
I assume that as.character(y)[sort.list(y)] is equivalent to as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the same effect as code in current 'factor'. Function 'sort.int' instead of 'sort' to be like 'sort.list' that fails for non-atomic input.
What I suggest is similar in form to default 'levels' in 'factor' in R before version 2.10.0, which is
sort(unique.default(x), na.last = TRUE)
If this suggestion is used, the help page for 'factor' can be changed to say "(by 'sort.int')" instead of "(by 'sort.list')".
More information about the R-devel
mailing list