[R] Technique for reading large sparse fwf data file

Tue Dec 13 14:36:45 CET 2005

I should have also noted in this email how I have allocated memory and
an error that appears.

I'm using Windows, so as in FAQ 2.2 I did

"C:\Program Files\R\R-2.2.0\bin\Rgui.exe" --sdi --max-mem-size=2Gb

# Check memory size in R
> example(memory.size)

mmry.s> memory.size()
[1] 11894064

mmry.s> memory.size(TRUE)
[1] 12500992

mmry.s> round(memory.limit()/1048576, 2)
[1] 2048

An interesting issue appears after trying to import the subset of the
larger file (which is a csv file 75,238 KB). R indicates it has run out
of memory as:

Error: vector memory exhausted (limit reached?)
Error: vector memory exhausted (limit reached?)

So, when I then try to quit R, it doesn't allow me to. Here is a copy
and paste from my workspace.

> quit()
Error: vector memory exhausted (limit reached?)
> quit()
Error: recursive default argument reference
> quit()
Error: vector memory exhausted (limit reached?)
> 

Clearly, enough memory is allocated to handle this file. But, I also
wonder why R then locks and I need to do a forced shut down.

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Doran, Harold
Sent: Tuesday, December 13, 2005 5:33 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Technique for reading large sparse fwf data file

Dear list:

A datafile was sent to me that is very large (92890 x 1620) and is
*very* sparse. Instead of leaving the entries with missing data blank,
each cell with missing data contains a dot (.)

The data are binary in almost all columns, with only a few columns
containing whole numbers, which I believe requires 2 bytes for the
binary and 4 for the others. So, by my calculations (assuming 4 bytes
for all cells to create an upperbound) I should need around 92890 * 1620
* 4 = 574MB to read in these data and about twice that for analyses. My
computer has 3GB. 

But, I am unable to read in the file even though I have allocated
sufficient memory to R for this. 

My first question is do the dots in the empty cells consume additional
memory? I am assuming the answer is yes and believe I should remove them
before I do the read in. Because my data are in a fixed width format
file, I can open the file in a text editor and find and replace all dots
with nothing. Then, I should retry the read in process? Maybe this will
work?

I created a smaller data file (~ 14000 * 1620) in SAS and tried to
import this subset (it still had the dots), but R still would not allow
for me to do so.

I could use a little guidance as I think I have allocated sufficient
memory to read in a datafile assuming my calculations are right.

Does anyone have any thoughts on a strategy?

Harold

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html