[Rd] Efficiency of factor objects

Jeffrey Ryan jeffrey.ryan at lemnica.com
Sat Nov 5 17:45:14 CET 2011

Or better still, extend R via the mechanisms in place.  Something akin
to a fast factor package.  Any change to R causes downstream issues in
(hundreds of?) millions of lines of deployed code.

It almost seems hard to fathom that a package for this doesn't already
exist. Have you searched CRAN?


On Sat, Nov 5, 2011 at 11:30 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Le vendredi 04 novembre 2011 à 19:19 -0400, Stavros Macrakis a écrit :
>> R factors are the natural way to represent factors -- and should be
>> efficient since they use small integers.  But in fact, for many (but
>> not all) operations, R factors are considerably slower than integers,
>> or even character strings.  This appears to be because whenever a
>> factor vector is subsetted, the entire levels vector is copied.
> Is it so common for a factor to have so many levels? One can probably
> argue that, in that case, using a numeric or character vector is
> preferred - factors are no longer the "natural way" of representing this
> kind of data.
> Adding code to fix a completely theoretical problem is generally not a
> good idea. I think you'd have to come up with a real use case to hope
> convincing the developers a change is needed. There are probably many
> more interesting areas where speedups can be gained than that.
> Regards
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Jeffrey Ryan
jeffrey.ryan at lemnica.com


More information about the R-devel mailing list