[Rd] Efficiency of factor objects

Milan Bouchet-Valat nalimilan at club.fr
Sat Nov 5 17:30:26 CET 2011


Le vendredi 04 novembre 2011 à 19:19 -0400, Stavros Macrakis a écrit :
> R factors are the natural way to represent factors -- and should be
> efficient since they use small integers.  But in fact, for many (but
> not all) operations, R factors are considerably slower than integers,
> or even character strings.  This appears to be because whenever a
> factor vector is subsetted, the entire levels vector is copied.
Is it so common for a factor to have so many levels? One can probably
argue that, in that case, using a numeric or character vector is
preferred - factors are no longer the "natural way" of representing this
kind of data.

Adding code to fix a completely theoretical problem is generally not a
good idea. I think you'd have to come up with a real use case to hope
convincing the developers a change is needed. There are probably many
more interesting areas where speedups can be gained than that.


Regards



More information about the R-devel mailing list