[R] Handling large data sets via scan()
Christoph Lehmann
christoph.lehmann at gmx.ch
Fri Feb 4 10:28:34 CET 2005
does it solve to a part your problem, if you use read.table() instead of
scan, since it imports data directly to a data.frame?
let me know, if it helps
Nawaaz Ahmed wrote:
> I'm trying to read in datasets with roughly 150,000 rows and 600
> features. I wrote a function using scan() to read it in (I have a 4GB
> linux machine) and it works like a charm. Unfortunately, converting the
> scanned list into a datafame using as.data.frame() causes the memory
> usage to explode (it can go from 300MB for the scanned list to 1.4GB for
> a data.frame of 30000 rows) and it fails claiming it cannot allocate
> memory (though it is still not close to the 3GB limit per process on my
> linux box - the message is "unable to allocate vector of size 522K").
>
> So I have three questions --
>
> 1) Why is it failing even though there seems to be enough memory available?
>
> 2) Why is converting it into a data.frame causing the memory usage to
> explode? Am I using as.data.frame() wrongly? Should I be using some
> other command?
>
> 3) All the model fitting packages seem to want to use data.frames as
> their input. If I cannot convert my list into a data.frame what can I
> do? Is there any way of getting around this?
>
> Much thanks!
> Nawaaz
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list