[R] Opinion: Why I find factors convenient to use
Petr Savicky
savicky at cs.cas.cz
Sat Aug 18 09:48:26 CEST 2012
On Fri, Aug 17, 2012 at 07:34:35PM +0100, Rui Barradas wrote:
> Hello,
>
> No, factors may use less memory. System dependent?
>
> > x <-sample(c("small","medium","large"),1e4,rep=TRUE)
> > y <- factor(x)
> > object.size(x)
> 80184 bytes
> > object.size(y)
> 40576 bytes
> >
> > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
> [3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
> [5] LC_TIME=Portuguese_Portugal.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rcapture_1.2-0 xts_0.8-0 zoo_1.7-7
>
> loaded via a namespace (and not attached):
> [1] chron_2.3-39 fortunes_1.4-2 grid_2.15.1 lattice_0.20-6 tools_2.15.1
>
>
> And I agree with what Steve said, stringsAsFactors = FALSE saves hours
> of debuging time.
Hi.
I use stringsAsFactors = FALSE quite frequently. If there is a discussion
on R-devel, whether this should be the default, i would support this.
Factors are very useful and sometimes necessary, but they are hard to manipulate.
As Jeff Newmiller said, it is a good strategy to prepare the data as character
type and convert to a factor, when they are complete. The users should know, how
to use factors, however the strategy "convert to a factor eventually" is
more consistent with not having stringsAsFactors = TRUE as the default.
Petr Savicky.
More information about the R-help
mailing list