[Rd] Suggestion of change to reduce overhead of 'table'
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Thu Dec 13 08:24:51 CET 2012
In R 2.7.2, if argument 'exclude' is not specified and input is already a factor, function 'table' uses the input as is. In R 2.15.2, in the same case, function 'table' always applies function 'factor' to the input. The time spent by 'factor' is not long, but is not negligible.
I suggest to change 'table' so that 'factor' is not called for input that is already a factor when it is known that the resulting levels is as in the input. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/table.R.
85c85,88
< a <- factor(a, levels = ll[!(ll %in% exclude)],
---
> llexcl <- ll %in% exclude
> if (any(llexcl) ||
> (useNA == "no" && any(is.na(ll))))
> factor(a, levels = ll[!llexcl],
86a90,91
> else
> a
Function 'table' calls function 'addNA' in some cases. I suggest to change 'addNA', too. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/factor.R.
336d335
< if (ifany & !any(is.na(x))) return(x)
338c337,339
< if (!any(is.na(ll))) ll <- c(ll, NA)
---
> hasNAlev <- any(is.na(ll))
> if ((ifany || hasNAlev) && !any(is.na(x))) return(x)
> if (!hasNAlev) ll <- c(ll, NA)
Instead of calling 'factor', 'addNA' can also change "levels" attribute and accordingly fill missing value in internal code of the factor.
More information about the R-devel
mailing list