[R] Memory consumption, integer versus factor
Ajay Narottam Shah
ajayshah at mayin.org
Sat Apr 30 06:44:13 CEST 2005
R is so smart! I found that when you switch a column from integer to
factor, the memory consumption goes down rather impressively.
Now I'd like to learn more. How does R do this? What does R do? How do
I learn more?
I got to thinking: If I was really smart, I'd see that a factor with 2
levels requires only 1 bit of storage. So I'd be able to cram 8 such
factors into a byte. But this would come at the price of complexity of
code since reading and writing that object would require sub-byte
operations. Does R go this far? I think not, given the more modest
gains that I see. Does he go down till a byte? A four-byte word
instead of 8-bytes of storage?
What are Ncells and Vcells, and what determines his consumption of
memory for each kind?
If you're curious about this, here's a program that serves as a demo:
x <- matrix(as.numeric(runif(1e6)>.5), nrow=100000)
D <- data.frame(x)
rm(x)
# Take stock:
gc()
sum(gc()[,2])
object.size(D)
# Switch to factors --
D$X1 <- factor(D$X1); D$X2 <- factor(D$X2); D$X3 <- factor(D$X3)
D$X4 <- factor(D$X4); D$X5 <- factor(D$X5); D$X6 <- factor(D$X6)
D$X7 <- factor(D$X7); D$X8 <- factor(D$X8); D$X9 <- factor(D$X9)
D$X10 <- factor(D$X10)
# Take stock:
gc()
sum(gc()[,2])
object.size(D)
Using this, I find that the cost of these 10 vectors goes down from 12
Meg to 8 Meg. This suggests savings, but not the dramatic impact of
recognising that a factor with 2 levels only requires 1 bit.
--
Ajay Shah Consultant
ajayshah at mayin.org Department of Economic Affairs
http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
More information about the R-help
mailing list