[R] R + memory of objects

Marc Jekel mjekel at uni-bonn.de
Fri Dec 2 16:17:49 CET 2011


Dear R community,

I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize working memory load. Simon (thanks!) and others gave me a hint to use the command "gc()" to clean up memory which works quite nice but appears to me to be more like a "fix" to a problem.

To give you an impression of what I am talking, here is a short code example + I will give rough measure (system track app) of my working memory needed for each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, approx 4 GB Ram):

##########################

# example 1:

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

# used working memory increases from 1044 -->  1808 MB

# (same command again, i.e.)

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

# 1808 MB -->  2178 MB Why does memory increase?

# (give the matrix column names)

colnames(y) = c("col1", "col2")

# 2178 MB -->  1781 MB Why does the size of an object decrease if I assign column labels?

###

# example 2:

y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)

1016 -->  1780 MB

y = data.frame(y)

# increase from 1780 MB -->  3315 MB

##########################

Why does it take so much extra memory to store this matrix as a data.frame?

It is not the object per se (i.e. that data.frames need more memory) because if I use gc() memory size drops to 1387 MB. Does this mean that it may be more memory-efficient not to use any data.frames but matrices only? etc.

This puzzles me a lot. From my experience these effects are also accentuated for larger objects.

As an anecdotal comparison: I also used Stata in my last project due to these memory problems and I could do a lot of variable manipulations of the same (!) data with significant (I am talking about GB) less memory needed.

Best,

Marc



More information about the R-help mailing list