[R] Large data sets with R (binding to hadoop available?)
Rory.WINSTON at rbs.com
Rory.WINSTON at rbs.com
Fri Aug 22 12:25:34 CEST 2008
Hi
Apart from database interfaces such as sqldf which Gabor has mentioned, there are also packages specifically for handling large data: see the "ff" package, for instance.
I am currently playing with parallelizing R computations via Hadoop. I haven't looked at PIG yet though.
Rory
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Roland Rau
Sent: 21 August 2008 20:04
To: Avram Aelony
Cc: r-help at r-project.org
Subject: Re: [R] Large data sets with R (binding to hadoop available?)
Hi
Avram Aelony wrote:
>
> Dear R community,
>
> I find R fantastic and use R whenever I can for my data analytic needs.
> Certain data sets, however, are so large that other tools seem to be
> needed to pre-process data such that it can be brought into R for
> further analysis.
>
> Questions I have for the many expert contributors on this list are:
>
> 1. How do others handle situations of large data sets (gigabytes,
> terabytes) for analysis in R ?
>
I usually try to store the data in an SQLite database and interface via functions from the packages RSQLite (and DBI).
No idea about Question No. 2, though.
Hope this helps,
Roland
P.S. When I am sure that I only need a certain subset of large data sets, I still prefer to do some pre-processing in awk (gawk).
2.P.S. The size of my data sets are in the gigabyte range (not terabyte range). This might be important if your data sets are *really large* and you want to use sqlite: http://www.sqlite.org/whentouse.html
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
***********************************************************************************
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority
This e-mail message is confidential and for use by the=2...{{dropped:22}}
More information about the R-help
mailing list