[R] Large Dataset

Simon Pickett simon.pickett at bto.org
Tue Jan 6 18:15:02 CET 2009


Increase the memory as much as you can, read in the data, (however long it 
takes) then aggregate the data into smaller chunks, selecting only the bits 
you want.

Remove the big original data set from memory (using rm()) and keep (or save 
the smaller aggregated data using wite.table())

If this doesnt work you may be out of luck I am afraid.

Sorry i cant be of more help but it seems that if you want to deal with 
collosal data sets, you need to get the right tools for the job (i.e. a 
better computer or more suitable software)

Simon.

----- Original Message ----- 
From: "Edwin Sendjaja" <edwin7 at web.de>
To: "Simon Pickett" <simon.pickett at bto.org>
Cc: <r-help at r-project.org>
Sent: Tuesday, January 06, 2009 5:04 PM
Subject: Re: [R] Large Dataset


> Hi Simons,
>
> Is SAS more powerfull than R?
>
> Well, I think I cannot afford to buy SAS.
>
> actually, my computer isn't  really slow. I think 4GB RAM is big enough 
> for
> personal PC.  I am just wondering, why R running so slow with these specs 
> to
> handling 3 GB data set. What if the data set were 1 TB?mmm..
>
>
> Edwin
>
>> Hi,
>>
>> I am not very knowledgeable about this kind of stuff but my guess is that
>> if you have a fairly slow computer and massive data sets there isnt alot
>> you can do except get a better computer, buy more RAM or use something 
>> like
>> SAS instead?
>>
>> Hopefully someone else will chip in Edwin, best of luck.
>>
>> Simon.
>>
>>
>> ----- Original Message -----
>> From: "Edwin Sendjaja" <edwin7 at web.de>
>> To: "Simon Pickett" <simon.pickett at bto.org>
>> Cc: <r-help at r-project.org>
>> Sent: Tuesday, January 06, 2009 2:53 PM
>> Subject: Re: [R] Large Dataset
>>
>> > Hi Simon,
>> >
>> > My RAM is only 3.2 GB (actually it should be 4 GB, but my Motherboard
>> > doesnt
>> > support it.
>> >
>> > R use almost of all my RAM and half of my swap. I think memory.limit 
>> > will
>> > not
>> > solve my problem.  It seems that I need  RAM.
>> >
>> > Unfortunately, I can't buy more RAM.
>> >
>> > Why R is slow reading big data set?
>> >
>> >
>> > Edwin
>> >
>> >> Only a couple of weeks ago I had to deal with this.
>> >>
>> >> adjust the memory limit as follows, although you might not want 4000,
>> >> that
>> >> is quite high....
>> >>
>> >> memory.limit(size = 4000)
>> >>
>> >> Simon.
>> >>
>> >> ----- Original Message -----
>> >> From: "Edwin Sendjaja" <edwin7 at web.de>
>> >> To: "Simon Pickett" <simon.pickett at bto.org>
>> >> Cc: <r-help at r-project.org>
>> >> Sent: Tuesday, January 06, 2009 12:24 PM
>> >> Subject: Re: [R] Large Dataset
>> >>
>> >> > Hi Simon,
>> >> >
>> >> > Thank for your reply.
>> >> > I have read ?Memory but I dont understand how to use. I am not sure 
>> >> > if
>> >> > that
>> >> > can solve my problem. Can you tell me more detail?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Edwin
>> >> >
>> >> >> type
>> >> >>
>> >> >> ?memory
>> >> >>
>> >> >> into R and that will explain what to do...
>> >> >>
>> >> >> S
>> >> >> ----- Original Message -----
>> >> >> From: "Edwin Sendjaja" <edwin7 at web.de>
>> >> >> To: <r-help at r-project.org>
>> >> >> Sent: Tuesday, January 06, 2009 11:41 AM
>> >> >> Subject: [R] Large Dataset
>> >> >>
>> >> >> > Hi alI,
>> >> >> >
>> >> >> > I  have a 3.1 GB Dataset ( with  11 coloumns and lots data in int
>> >> >> > and
>> >> >> > string).
>> >> >> > If I use read.table; it takes very long. It seems that my RAM is
>> >> >> > not big
>> >> >> > enough (overload) I have 3.2 RAM and  7GB SWAP, 64 Bit Ubuntu.
>> >> >> >
>> >> >> > Is there a best sultion to read a large data R? I have seen, that
>> >> >> > people
>> >> >> > suggest to use bigmemory package, ff. But it seems very
>> >> >> > complicated. I dont
>> >> >> > know how to start with that packages.
>> >> >> >
>> >> >> > i have tried to use bigmemory. But I got some kind of errors. 
>> >> >> > Then
>> >> >> > I
>> >> >> > gave up.
>> >> >> >
>> >> >> >
>> >> >> > can someone give me an simple example how ot use ff or 
>> >> >> > bigmemory?or
>> >> >> > maybe
>> >> >> > re
>> >> >> > better sollution?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Thank you in advance,
>> >> >> >
>> >> >> >
>> >> >> > Edwin
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > R-help at r-project.org mailing list
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
>> >> >> > http://www.R-project.org/posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible 
>> >> >> > code.
>
>
>




More information about the R-help mailing list