[R] Opinion: Why I find factors convenient to use

Rui Barradas ruipbarradas at sapo.pt
Fri Aug 17 20:34:35 CEST 2012


Hello,

No, factors may use less memory. System dependent?

 > x <-sample(c("small","medium","large"),1e4,rep=TRUE)
 > y <- factor(x)
 > object.size(x)
80184 bytes
 > object.size(y)
40576 bytes
 >
 > sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] Rcapture_1.2-0 xts_0.8-0      zoo_1.7-7

loaded via a namespace (and not attached):
[1] chron_2.3-39   fortunes_1.4-2 grid_2.15.1    lattice_0.20-6 tools_2.15.1


And I agree with what Steve said, stringsAsFactors = FALSE saves hours 
of debuging time.

Rui Barradas

Em 17-08-2012 19:19, Bert Gunter escreveu:
> Steve, et. al:
>
> Yes, if object.size() is to be believed, you're right:
>
>> x <-sample(c("small","medium","large"),1e4,rep=TRUE)
>> y <- factor(x)
>> object.size(x)
> 40120 bytes
>> object.size(y)
> 40336 bytes
>
> I stand (happily) corrected.
>
> -- Bert
>
> On Fri, Aug 17, 2012 at 11:09 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> On Fri, Aug 17, 2012 at 1:58 PM, Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>> I don't know if my recent post on this prompted your post, but I don't see much to argue with in your discussion. I find factors to be useful for managing display and some kinds of analysis.
>>>
>>> However, I find them mostly a handicap when importing, merging, and handling data QC. Therefore I delay conversion until late in the game... but usually I do eventually convert in most cases.
>> Agreed here -- I actually haven't been tuned into any such recent
>> conversation (if there was one), but if I were a gambling man, I'd bet
>> that the majority of the problems people have with factors can
>> probably be boiled down to the fact that the default value for
>> stringsAsFactors is TRUE.
>>
>> I like factors -- that said, I am annoyed by them at times, but I
>> still like them.
>>
>> Also, Bert mentioned that he thinks they save space over characters --
>> I believe that this is no longer true, but I'm not certain.
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>   | Memorial Sloan-Kettering Cancer Center
>>   | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>




More information about the R-help mailing list