[R] TM Package - Corpus function - Memory Allocation Problems

David Winsemius dwinsemius at comcast.net
Tue Aug 17 22:06:53 CEST 2010

On Aug 17, 2010, at 3:45 PM, Guelman, Leo wrote:

> I'm using R 2.11.1 on Win XP (32-bit) with 3 GB of RAM. My data has
> (only) 16.0 MB.

Probably more than that. Each numeric is 8 bytes even before overhead,  
so a csv file that was all single digit integers and commas would more  
that double in size unless they were declared to be integer in the  
read step.

> I want to create a VCorpus object using the Corpus function in the tm
> package but I'm running into Memory allocation issues: "Error: cannot
> allocate vector of size 372 Kb".
> My data is stored in a csv file which I've imported with "read.csv"  
> and
> then used the following to create the Corpus (but it failed with the
> error message above)
> txt <- Corpus(DataframeSource(txt))

You probably have other objects in your workspace. When I want to know  
what is taking up the most space I use this function:

getsizes <-function() {z <- sapply(ls(envir=globalenv()),
                                 function(x) object.size(get(x)))
                (tmp <- as.matrix(rev(sort(z))[1:10]))}

Clearing out your workspace by removing everything might be the best  
approach, since the memory allocated to new objects needs to be  
contiguous. You ought to make sure that you are not running tons of  
other windoze applications that are restricting your default 2Gb:


> I've even tried to subset ~ 10% of my data but I run into the same
> error. What is a the best way to solve this memory problem other than
> increasing a
> physical RAM?
> Thanks in advance for any help,
> Leo.

David Winsemius, MD
West Hartford, CT

More information about the R-help mailing list