[Rd] Suggestion of change to reduce overhead of 'table'

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Thu Dec 13 08:24:51 CET 2012


In R 2.7.2, if argument 'exclude' is not specified and input is already a factor, function 'table' uses the input as is. In R 2.15.2, in the same case, function 'table' always applies function 'factor' to the input. The time spent by 'factor' is not long, but is not negligible.


I suggest to change 'table' so that 'factor' is not called for input that is already a factor when it is known that the resulting levels is as in the input. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/table.R.

85c85,88
<                         a <- factor(a, levels = ll[!(ll %in% exclude)],
---
>                         llexcl <- ll %in% exclude
>                         if (any(llexcl) ||
>                         (useNA == "no" && any(is.na(ll))))
>                             factor(a, levels = ll[!llexcl],
86a90,91
>                         else
>                             a


Function 'table' calls function 'addNA' in some cases. I suggest to change 'addNA', too. This is diff against https://svn.r-project.org/R/trunk/src/library/base/R/factor.R.

336d335
<     if (ifany & !any(is.na(x))) return(x)
338c337,339
<     if (!any(is.na(ll))) ll <- c(ll, NA)
---
>     hasNAlev <- any(is.na(ll))
>     if ((ifany || hasNAlev) && !any(is.na(x))) return(x)
>     if (!hasNAlev) ll <- c(ll, NA)


Instead of calling 'factor', 'addNA' can also change "levels" attribute and accordingly fill missing value in internal code of the factor.



More information about the R-devel mailing list