[R] Memory consumption, integer versus factor
Duncan Murdoch
murdoch at stats.uwo.ca
Sat Apr 30 11:50:26 CEST 2005
Ajay Narottam Shah wrote:
> R is so smart! I found that when you switch a column from integer to
> factor, the memory consumption goes down rather impressively.
>
> Now I'd like to learn more. How does R do this? What does R do?
Most numeric variables are stored as 8 byte doubles. Factors are stored
as 4 byte integers, plus a table giving the factor levels.
> How do
> I learn more?
You will sometimes find what you want in the R Language Definition, for
example here:
"Factors are currently implemented using an integer array to specify the
actual levels and
a second array of names that are mapped to the integers. Rather
unfortunately users often
make use of the implementation in order to make some calculations
easier. This, however, is an
implementation issue and is not guaranteed to hold in all
implementations of R."
For more details, there are some implementation documents on
developer.r-project.org, but in general the only sure way to find out
how something is implemented is to look at the source code.
Usually it's a bad idea to rely on the implementation details, as the
last sentence quoted above says. If it's not documented, it's subject
to change without warning.
>
> I got to thinking: If I was really smart, I'd see that a factor with 2
> levels requires only 1 bit of storage. So I'd be able to cram 8 such
> factors into a byte. But this would come at the price of complexity of
> code since reading and writing that object would require sub-byte
> operations. Does R go this far? I think not, given the more modest
> gains that I see. Does he go down till a byte? A four-byte word
> instead of 8-bytes of storage?
>
> What are Ncells and Vcells, and what determines his consumption of
> memory for each kind?
See the man pages ?gc, ?Memory, and the source code.
Duncan Murdoch
More information about the R-help
mailing list