[R] R Memory Usage Concerns
Thomas Lumley
tlumley at u.washington.edu
Tue Sep 15 16:15:12 CEST 2009
On Tue, 15 Sep 2009, Evan Klitzke wrote:
> On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>> As already suggested, you're (much) better off if you specify colClasses, e.g.
>> tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
>> Otherwise, R has to load all the data, make a best guess of the column
>> classes, and then coerce (which requires a copy).
> Thanks Henrik, I tried this as well as a variant that another user
> sent me privately. When I tell R the colClasses, it does a much better
> job of allocating memory (ending up with 96M of RSS memory, which
> isn't great but is definitely acceptable).
> A couple of notes I made from testing some variants, if anyone else is
> interested:
> * giving it an nrows argument doesn't help it allocate less memory
> (just a guess, but maybe because it's trying the powers-of-two
> allocation strategy in both cases)
> * there's no difference in memory usage between telling it a column
> is "numeric" vs "double"
Because they are the same type
> * when telling it the types in advance, loading the table is much, much faster
> Maybe if I gather some more fortitude in the future, I'll poke around
> at the internals and see where the extra memory is going, since I'm
> still curious where the extra memory is going. Is that just the
> overhead of allocating a full object for each value (i.e. rather than
> just a double[] or whatever)?
No, because it doesn't allocate a full object for each value, it does just allocate a double[] plus a
constant amount of overhead. R doesn't have scalar types so there isn't even such a thing as an object
for a single value, just vectors with a single element. R will use more than the object size for the data
set, because it makes temporary copies of things.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list