[R] Opinion: Why I find factors convenient to use
Rui Barradas
ruipbarradas at sapo.pt
Fri Aug 17 20:34:35 CEST 2012
Hello,
No, factors may use less memory. System dependent?
> x <-sample(c("small","medium","large"),1e4,rep=TRUE)
> y <- factor(x)
> object.size(x)
80184 bytes
> object.size(y)
40576 bytes
>
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcapture_1.2-0 xts_0.8-0 zoo_1.7-7
loaded via a namespace (and not attached):
[1] chron_2.3-39 fortunes_1.4-2 grid_2.15.1 lattice_0.20-6 tools_2.15.1
And I agree with what Steve said, stringsAsFactors = FALSE saves hours
of debuging time.
Rui Barradas
Em 17-08-2012 19:19, Bert Gunter escreveu:
> Steve, et. al:
>
> Yes, if object.size() is to be believed, you're right:
>
>> x <-sample(c("small","medium","large"),1e4,rep=TRUE)
>> y <- factor(x)
>> object.size(x)
> 40120 bytes
>> object.size(y)
> 40336 bytes
>
> I stand (happily) corrected.
>
> -- Bert
>
> On Fri, Aug 17, 2012 at 11:09 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> On Fri, Aug 17, 2012 at 1:58 PM, Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>> I don't know if my recent post on this prompted your post, but I don't see much to argue with in your discussion. I find factors to be useful for managing display and some kinds of analysis.
>>>
>>> However, I find them mostly a handicap when importing, merging, and handling data QC. Therefore I delay conversion until late in the game... but usually I do eventually convert in most cases.
>> Agreed here -- I actually haven't been tuned into any such recent
>> conversation (if there was one), but if I were a gambling man, I'd bet
>> that the majority of the problems people have with factors can
>> probably be boiled down to the fact that the default value for
>> stringsAsFactors is TRUE.
>>
>> I like factors -- that said, I am annoyed by them at times, but I
>> still like them.
>>
>> Also, Bert mentioned that he thinks they save space over characters --
>> I believe that this is no longer true, but I'm not certain.
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
More information about the R-help
mailing list