[R] size limitations in R

Wensui Liu liuwensui at gmail.com
Fri Aug 31 19:35:08 CEST 2007


can't agree more with Danial.
I love sqlite db and use it to exchange data between R, python, and
SAS. data stored in sqlite is 100 times better than in csv, because
all data attributes can be preserved.

On 8/31/07, Daniel Lakeland <dlakelan at street-artists.org> wrote:
> On Fri, Aug 31, 2007 at 01:31:12PM +0100, Fabiano Vergari wrote:
>
> > I am a SAS user currently evaluating R as a possible addition or
> > even replacement for SAS. The difficulty I have come across straight
> > away is R's apparent difficulty in handling relatively large data
> > files. Whilst I would not expect it to handle datasets with millions
> > of records, I still really need to be able to work with dataset with
> > 100,000+ records and 100+ variables. Yet, when reading a .csv file
> > with 180,000 records and about 200 variables, the software virtually
> > ground to a halt (I stopped it after 1 hour). Are there guidelines
> > or maybe a limitations document anywhere that helps me assess the
> > size
>
> 180k records with 200 variables = 36 million entries, if they're
> numeric then they're doubles taking up 8 bytes, so 288 MB of RAM. This
> should be perfectly fine for R, as long as you have that much free
> RAM.
>
> However, the routines that read CSV and tabular delimited files are
> relatively inefficient for such large files.
>
> In order to handle large data files, it is better to use one of the
> database interfaces. My preference would be sqlite unless I already
> had the data on a mysql or other database server.
>
> the documentation for the packages RSQLite and SQLiteDF should be
> helpful, as well as the documentation for SQLite itself, which has a
> facility for efficiently importing CSV and similar files directly to a
> SQLite database.
>
> eg: http://netadmintools.com/art572.html
>
>
>
> --
> Daniel Lakeland
> dlakelan at street-artists.org
> http://www.street-artists.org/~dlakelan
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
===============================
"I am dying with the help of too many
physicians." - Alexander the Great, on his deathbed
===============================
WenSui Liu
(http://spaces.msn.com/statcompute/blog)



More information about the R-help mailing list