[R] Running out of memory when importing SPSS files
Thomas Lumley
tlumley at u.washington.edu
Thu Feb 19 09:26:56 CET 2009
On Wed, 18 Feb 2009, Uwe Ligges wrote:
> dobomode wrote:
>> Hello R-help,
>>
>> I am trying to import a large dataset from SPSS into R. The SPSS file
>> is in .SAV format and is about 1GB in size. I use read.spss to import
>> the file and get an error saying that I have run out of memory. I am
>> on a MAC OS X 10.5 system with 4GB of RAM. Monitoring the R process
>> tells me that R runs out of memory when reaching about 3GB of RAM so I
>> suppose the remaining 1GB is used up by the OS.
>>
>> Why would a 1GB SPSS file take up more than 3GB of memory in R?
>
> Because SPSS stores data in a compressed way?
Or because R uses quite a lot more memory to read a data set than to store it. Either way, even if the data set eventually took up only 1Gb in R you still would probably not be able to work usefully with it on a 32-bit machine.
You need to either use a 64-bit system or avoid loading the whole data set. Unfortunately read.spss can't read the data selectively [something I'd like to fix, sometime], but if you had a .csv file you could read a subset of columns or rows using read.table.
A better bet is likely to be putting the data set into a database (SQLite is easiest) and reading subsets of the data that way. That's how I handle data sets of a few Gb (on a laptop with 1Gb memory).
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list