[R-sig-hpc] doRedis on ec2:no new jobs being assigned to workers after first job completes

Thu Aug 23 19:10:30 CEST 2012

On 23-08-12 18:31, M. Edward (Ed) Borasky wrote:
> Could this be a Redis problem? I'm on the Redis mailing list (
> redis-db at googlegroups.com) and they have some workarounds for common
> Amazon EC2 issues.

Possibly, will check the redis-list for clues as well, thank you.

>
> On Thu, Aug 23, 2012 at 6:00 AM, OpenTrades <jan at opentrades.nl> wrote:
>> I have been trying to use the R/Quantstrat applyParameters() function to
>> analyze the performance for 1200 different Moving Average parameter
>> combinations in the "Luxor" trading strategy. As I want to test with 6
>> years of GBP/USD pricing info at a 30 minute interval, each individual
>> tests requires 15 minutes or more, so this is where doRedis comes into
>> play.
>>
>> The platform that I am using is Amazon EC2. I configure one Redis server
>> as a master, and from this master I also run the R program. Then I start
>> up a bunch of slaves, containing 8 CPUs and 7GB of memory each
>> (c1.xlarge in Amazon EC2 parlor), which automatically connect to the
>> redis server, so I end up with a range of workers, which is a multiple of 8.
>>
>> Now if I test with only 10 days of data (instead of 6 years), or even 3
>> months, everything is great, whether I use 1 slave or 40 slaves (= 320
>> workers). But whenever I run with more than 1 year of data, I start
>> getting problems, where workers will no longer be assigned a new job
>> after the first one completes, and the worker count decreases in the
>> redis-server window. Whenever this occurs, then if I restart the workers
>> manually, they will log on to the redis server again and get assigned a
>> new job; however once again no new job will be assigned to them after
>> completion.
>>
>> It seems that this has something to do with the amount of time that it
>> takes to complete the task. When renting multiple EC2 machines of the
>> same instance type, they often appear to be slightly different
>> nevertheless. And I found that, when running with 15 months of price
>> data, one particular type (CPU freq = 2.13GHz / cache size=4096MB) gets
>> assigned one job for each of its 8 workers, but then none to any worker
>> anymore, where another much faster type (CPU freq=2.33GHz / cache
>> size=6144MB) runs continuously without a flaw. FYI: all slaves share the
>> same identical disk image.
>>
>> Anyone any ideas how to proceed and fix this? Or test I can do to find out
>> what goes wrong?
>>
>> --
>> Jan Humme - OpenTrades
>>
>> WWW:     http://www.opentrades.nl
>> Email:   jan at opentrades.nl
>> Twitter: @opentrades
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>

-- 
Jan Humme - OpenTrades

WWW:     http://www.opentrades.nl
Email:   jan at opentrades.nl
Twitter: @opentrades