[R-sig-hpc] distributed R on EC2, designing the software stack

Stephen J. Barr stephenjbarr at gmail.com
Wed May 13 23:14:05 CEST 2009


Thanks for doing that. I am looking forward to giving it a try.
==========================================
Stephen J. Barr
University of Washington
WEB: www.econsteve.com
==========================================



On Wed, May 13, 2009 at 2:09 PM, Saptarshi Guha
<saptarshi.guha at gmail.com> wrote:
> Hello,
> I'll be releasing easy examples (and install scripts) for RHIPE on EC2 later
> this week. I have tested it out on EC2 and it works smoothly, however I
> haven't experimented with S3 datasets.
>
> I'll send the link to the examples end of week.
>
> Currently, I'm refactoring the code and it should be working in  a day (i
> hope!).
>
>
> Saptarshi
>
>
> On Wed, May 13, 2009 at 4:40 PM, Avram Aelony <AvramAelony at eharmony.com>wrote:
>
>> Hello,
>>
>> Is there documentation on how to use RHIPE with Amazon's EC2 using data
>> from S3 ?
>>
>> Thank-you,
>> Avram
>>
>>
>>
>>
>> -----Original Message-----
>> From: r-sig-hpc-bounces at r-project.org [mailto:
>> r-sig-hpc-bounces at r-project.org] On Behalf Of Saptarshi Guha
>> Sent: Saturday, May 02, 2009 12:01 PM
>> To: malcolm Crouch
>> Cc: R-sig-hpc at r-project.org
>> Subject: Re: [R-sig-hpc] distributed R on EC2, designing the software stack
>>
>> Hello,
>> I've been testing R across clusters and will execute some more in
>> depth tests over the weekend.
>> However, it depends on what Instance Types you use (types and cost:
>> http://aws.amazon.com/ec2/instance-types/)
>>
>> I used a cluster of 15 "standard large instance", for less than  1 hr,
>> which cost me $6. Intra EC2 bandwidth is free (in the same region),
>> and downloading a GB from outside of the EC2 is ~$0.1 per GB. So, yes,
>> depending on the budget (i'm a grad student and it is turning out to
>> be expensive) it can easily run into hundreds of dollars on a monthly
>> basis if used regularly.
>>
>> Keep in mind that if you have a web service running on an instance and
>> check the website (e.g cluster status monitoring) from outside the EC2
>> an expense is incurred. Also as soon as the instance is up and running
>> you're being charged. Do not forget to terminate them!
>>
>> I'll be tabulating my performance and expenditure in the coming week.
>>
>> Hope it helps
>> Saptarshi Guha
>>
>>
>>
>> On Sat, May 2, 2009 at 2:03 PM, malcolm Crouch
>> <malcolm.croucher at gmail.com> wrote:
>> >
>> > Hi ,
>> >
>> >
>> > i have looked at the pricing for Amazon , however am wondering from a
>> > practical point of view what roughly your monthly costs are ?
>> >
>> > I am interested in using this service as I dont have access to a cluster
>> but
>> > need to do a feasilibilty study ... therefore some ideas on costing would
>> be
>> > useful.
>> >
>> > Regards
>> >
>> > Malcolm
>> > On Wed, Apr 29, 2009 at 9:36 PM, Saptarshi Guha <
>> saptarshi.guha at gmail.com>
>> > wrote:
>> >>
>> >> Hello,
>> >> Yes, I was playing with EC2 and Rhipe last night. Just got permission
>> >> to increase my instances to 100!
>> >> The details (what I know)
>> >> RHIPE is based on Hadoop and R. Cloudera has a very easy to use AMI
>> >> for small and large (32/64bit) instances. It is easy enough to install
>> >> the cloudera AMI.
>> >> However It does not come with R.
>> >>
>> >> Last night, I modified their scripts to yum install R (using yum we
>> >> get R-2.6) on each machine - as such this results in ~21MB downloads
>> >> on the machines[1], which is not expensive but is not the best way do
>> >> things.
>> >>
>> >> Once booted, each machine installs R, Rserve and one machine (the
>> >> master) installs RHIPE.
>> >> I did it with 1 master and 1 tasktracker and RHIPE worked. I intend to
>> >> check with 30+ instances to see how things scale.
>> >>
>> >> I have emailed cloudera asking them to bundle R with their Hadoop AMI
>> >> - so that users incur a minimal expense.
>> >>
>> >> I will be placing EC2 instructions to use RHIPE shortly this week.
>> >> Given the reasonable cost of EC2, it would be a great way for users to
>> >> test out distributed computing with R. Maybe as part of the R
>> >> community we could host a linux AMI? Again, cost is the issue
>> >> here(rather not pay for users downloading things)
>> >>
>> >> Regards
>> >> Saptarshi Guha
>> >> [1] Not quite sure how the AMI's work - if 10 AMIs belong to one
>> >> group, does EC2 boot up one and replicate the booted instance? If so,
>> >> then there is only one download, if not each machine downloads.
>> >>
>> >>
>> >> On Wed, Apr 29, 2009 at 3:24 PM, Whit Armstrong
>> >> <armstrong.whit at gmail.com> wrote:
>> >> > you should contact Robert Grossman who just gave a presentation on
>> >> > this topic at R/Finance in Chicago.
>> >> >
>> >> > link: http://rinfinance.quantmod.com/speakers/
>> >> >
>> >> > -Whit
>> >> >
>> >> >
>> >> > On Wed, Apr 29, 2009 at 3:06 PM, Stephen J. Barr
>> >> > <stephenjbarr at gmail.com> wrote:
>> >> >> Greetings,
>> >> >>
>> >> >> I am trying to get into distributed computing with R, but do not have
>> >> >> access to a cluster. Therefore, I am trying to get distributed R
>> >> >> running on Amazon's EC2. ( http://aws.amazon.com/ec2/ )
>> >> >>
>> >> >> For those of you who don't know, EC2 allows you to instantiate large
>> >> >> numbers of computers, bundled with whatever OS and software
>> >> >> configuration you want. From my survey of things, there are a lot of
>> >> >> different options available for distributed computing. For my needs,
>> I
>> >> >> would just like to run simple Monte Carlo simulations, and other
>> >> >> things that don't require a ton of inter-node communication.
>> >> >>
>> >> >> What I would like to do is put together a public AMI and a howto
>> >> >> guide, such that it would be very easy for anyone to instantiate an
>> >> >> N-node cluster and start with parallel computing. I would like to
>> have
>> >> >> a discussion/brainstorm over what the exact software stack should be.
>> >> >>
>> >> >> My initial thoughts were:
>> >> >>
>> >> >> 1) R 2.9.0 + OpenMPI + RMpi + Snowfall/sfCluster
>> >> >>   - will Amazon's network work with OpenMPI. Perhaps it would be
>> >> >> better to use PVM or something that is more tolerant to non-optimal
>> >> >> network
>> >> >>
>> >> >> 2)  R 2.9.0 + "socket based communication" + Snowfall/sfCluster
>> >> >>  - is this scalable
>> >> >>
>> >> >> 3)  R 2.9.0 + twisted + NetWorkSpaces
>> >> >>   - not sure of Amazon's network supports broadcast mode, which is
>> >> >> required by twisted
>> >> >>
>> >> >> 4) Biocep-R
>> >> >>   - this looks like it has the functionality to do what I want, but a
>> >> >> lot of other stuff as well.
>> >> >>
>> >> >> 5) RHIPE
>> >> >>   - Hadoop is well supported by EC2. Perhaps this is the way to go.
>> >> >> Seems like a very new package :)
>> >> >>
>> >> >> What are people's thoughts on what would be a good software stack
>> with
>> >> >> the constraint that it should be simple and run on EC2?
>> >> >>
>> >> >> Thanks,
>> >> >> -stephen
>> >> >> ==========================================
>> >> >> Stephen J. Barr
>> >> >> University of Washington
>> >> >> WEB: www.econsteve.com
>> >> >>
>> >> >> _______________________________________________
>> >> >> R-sig-hpc mailing list
>> >> >> R-sig-hpc at r-project.org
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > R-sig-hpc mailing list
>> >> > R-sig-hpc at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> >> >
>> >>
>> >> _______________________________________________
>> >> R-sig-hpc mailing list
>> >> R-sig-hpc at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> >
>> >
>> >
>> > --
>> > Malcolm A.B Croucher
>> >
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list