[R-sig-hpc] doRedis on ec2:no new jobs being assigned to workers after first job completes

Thu Aug 23 18:31:45 CEST 2012

Could this be a Redis problem? I'm on the Redis mailing list (
redis-db at googlegroups.com) and they have some workarounds for common
Amazon EC2 issues.

On Thu, Aug 23, 2012 at 6:00 AM, OpenTrades <jan at opentrades.nl> wrote:
> I have been trying to use the R/Quantstrat applyParameters() function to
> analyze the performance for 1200 different Moving Average parameter
> combinations in the "Luxor" trading strategy. As I want to test with 6
> years of GBP/USD pricing info at a 30 minute interval, each individual
> tests requires 15 minutes or more, so this is where doRedis comes into
> play.
>
> The platform that I am using is Amazon EC2. I configure one Redis server
> as a master, and from this master I also run the R program. Then I start
> up a bunch of slaves, containing 8 CPUs and 7GB of memory each
> (c1.xlarge in Amazon EC2 parlor), which automatically connect to the
> redis server, so I end up with a range of workers, which is a multiple of 8.
>
> Now if I test with only 10 days of data (instead of 6 years), or even 3
> months, everything is great, whether I use 1 slave or 40 slaves (= 320
> workers). But whenever I run with more than 1 year of data, I start
> getting problems, where workers will no longer be assigned a new job
> after the first one completes, and the worker count decreases in the
> redis-server window. Whenever this occurs, then if I restart the workers
> manually, they will log on to the redis server again and get assigned a
> new job; however once again no new job will be assigned to them after
> completion.
>
> It seems that this has something to do with the amount of time that it
> takes to complete the task. When renting multiple EC2 machines of the
> same instance type, they often appear to be slightly different
> nevertheless. And I found that, when running with 15 months of price
> data, one particular type (CPU freq = 2.13GHz / cache size=4096MB) gets
> assigned one job for each of its 8 workers, but then none to any worker
> anymore, where another much faster type (CPU freq=2.33GHz / cache
> size=6144MB) runs continuously without a flaw. FYI: all slaves share the
> same identical disk image.
>
> Anyone any ideas how to proceed and fix this? Or test I can do to find out
> what goes wrong?
>
> --
> Jan Humme - OpenTrades
>
> WWW:     http://www.opentrades.nl
> Email:   jan at opentrades.nl
> Twitter: @opentrades
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-- 
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://j.mp/QCsXOr

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?