[R] Handling large dataset & dataframe
andy_liaw at merck.com
Mon Apr 24 21:07:22 CEST 2006
Instead of reading the entire data in at once, you read a chunk at a time,
and compute X'X and X'y on that chunk, and accumulate (i.e., add) them.
There are examples in "S Programming", taken from independent replies by the
two authors to a post on S-news, if I remember correctly.
From: Sachin J
> Can you elaborate more.
> Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> You just need the much smaller cross product matrix X'X and
> vector X'Y so you can build those up as you read the data in
> in chunks.
> On 4/24/06, Sachin J wrote:
> > Hi,
> > I have a dataset consisting of 350,000 rows and 266 columns. Out of
> > 266 columns 250 are dummy variable columns. I am trying to
> read this
> > data set into R dataframe object but unable to do it due to memory
> > size limitations (object size created is too large to
> handle in R). Is
> > there a way to handle such a large dataset in R.
> > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
> > Any pointers would be of great help.
> > TIA
> > Sachin
> > ---------------------------------
> > [[alternative HTML version deleted]]
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> [[alternative HTML version deleted]]
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide!
More information about the R-help