[Rd] Efficiency of factor objects

Patrick Burns pburns at pburns.seanet.com
Sat Nov 5 19:12:56 CET 2011

Perhaps 'data.table' would be a package
on CRAN that would be acceptable.

On 05/11/2011 16:45, Jeffrey Ryan wrote:
> Or better still, extend R via the mechanisms in place.  Something akin
> to a fast factor package.  Any change to R causes downstream issues in
> (hundreds of?) millions of lines of deployed code.
> It almost seems hard to fathom that a package for this doesn't already
> exist. Have you searched CRAN?
> Jeff
> On Sat, Nov 5, 2011 at 11:30 AM, Milan Bouchet-Valat<nalimilan at club.fr>  wrote:
>> Le vendredi 04 novembre 2011 à 19:19 -0400, Stavros Macrakis a écrit :
>>> R factors are the natural way to represent factors -- and should be
>>> efficient since they use small integers.  But in fact, for many (but
>>> not all) operations, R factors are considerably slower than integers,
>>> or even character strings.  This appears to be because whenever a
>>> factor vector is subsetted, the entire levels vector is copied.
>> Is it so common for a factor to have so many levels? One can probably
>> argue that, in that case, using a numeric or character vector is
>> preferred - factors are no longer the "natural way" of representing this
>> kind of data.
>> Adding code to fix a completely theoretical problem is generally not a
>> good idea. I think you'd have to come up with a real use case to hope
>> convincing the developers a change is needed. There are probably many
>> more interesting areas where speedups can be gained than that.
>> Regards
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
(home of 'Some hints for the R beginner'
and 'The R Inferno')

More information about the R-devel mailing list