[R] Handling large data sets via scan()

Nawaaz Ahmed nawaaz at inktomi.com
Fri Feb 4 07:40:17 CET 2005


I'm trying to read in datasets with roughly 150,000 rows and 600
features. I wrote a function using scan() to read it in (I have a 4GB
linux machine) and it works like a charm.  Unfortunately, converting the
scanned list into a datafame using as.data.frame() causes the memory
usage to explode (it can go from 300MB for the scanned list to 1.4GB for
a data.frame of 30000 rows) and it fails claiming it cannot allocate
memory (though it is still not close to the 3GB limit per process on my
linux box - the message is "unable to allocate vector of size 522K"). 

So I have three questions --

1) Why is it failing even though there seems to be enough memory available?

2) Why is converting it into a data.frame causing the memory usage to
explode? Am I using as.data.frame() wrongly? Should I be using some
other command?

3) All the model fitting packages seem to want to use data.frames as
their input. If I cannot convert my list into a data.frame what can I
do? Is there any way of getting around this?

Much thanks!
Nawaaz




More information about the R-help mailing list