[R] Memory consumption, integer versus factor

Duncan Murdoch murdoch at stats.uwo.ca
Sat Apr 30 11:50:26 CEST 2005


Ajay Narottam Shah wrote:
> R is so smart! I found that when you switch a column from integer to
> factor, the memory consumption goes down rather impressively.
> 
> Now I'd like to learn more. How does R do this? What does R do?

Most numeric variables are stored as 8 byte doubles.  Factors are stored 
as 4 byte integers, plus a table giving the factor levels.

> How do
> I learn more?

You will sometimes find what you want in the R Language Definition, for 
example here:

"Factors are currently implemented using an integer array to specify the 
actual levels and
a second array of names that are mapped to the integers. Rather 
unfortunately users often
make use of the implementation in order to make some calculations 
easier. This, however, is an
implementation issue and is not guaranteed to hold in all 
implementations of R."

For more details, there are some implementation documents on 
developer.r-project.org, but in general the only sure way to find out 
how something is implemented is to look at the source code.

Usually it's a bad idea to rely on the implementation details, as the 
last sentence quoted above says.  If it's not documented, it's subject 
to change without warning.

> 
> I got to thinking: If I was really smart, I'd see that a factor with 2
> levels requires only 1 bit of storage. So I'd be able to cram 8 such
> factors into a byte. But this would come at the price of complexity of
> code since reading and writing that object would require sub-byte
> operations. Does R go this far? I think not, given the more modest
> gains that I see. Does he go down till a byte? A four-byte word
> instead of 8-bytes of storage?
> 
> What are Ncells and Vcells, and what determines his consumption of
> memory for each kind?

See the man pages ?gc, ?Memory, and the source code.

Duncan Murdoch




More information about the R-help mailing list