[R-sig-hpc] R on Amazon Web Service question

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Sun Jul 17 16:59:08 CEST 2011


On Sun, Jul 17, 2011 at 8:30 AM, Takatsugu Kobayashi
<taquito2007 at gmail.com> wrote:
> Hi
>
> I am a newbie to R/AWS and please forgive me my basic question. I
> placed this thread in AWS Forum but haven't heard from the community
> yet.
>
> I was told by my boss at my company to set up the system in which to
> process a large dataset and analyze it statistically. The input data
> ranges from 5GB - 10GB.
>
> The packages of AWS products I can think of inclues
>
> - EC2 high memory instance for data processing
> - RDS Oracle DB for data processing
>
> and
>
> - EC2 medium-large CPU instances for data analysis with R
> - Map Reduce for data analysis with R
>
>  and
>
> - S3
>
> Now I have two questions regarding R and AWS:
>
> 1. Is ONE medium-high CPU instance enough for statistical-analysis of
> that large dataset/
> 2. Should I subscribe the same number of MapReduce services as those
> of EC2 instances?

There is no way we can advice you on this without more details.  It
seems the data is stored in an oracle database?  But the most
important thing is what analysis you want to do and how fast you want
it done (and how much time you want to spend coding).  The beauty of
aws is that it is easy for you to experiment with different setups.

Kasper



More information about the R-sig-hpc mailing list