[R-sig-hpc] Making a series of similar, but modified .r files- suggested method(s)? Re: Running jobs on a linux cluster

Mon Aug 23 02:36:25 CEST 2010

You can use R to dynamically generate the .R files and then process
them with Rmpi and snow.  See the examples on my replication page or
the following file that generates 3 .R files of data and related files
for an R + JAGS session.  Would post on the R-hpc list, but this is
not one I can provide as a reproducible example.  In what I do, I then
dispatch the multiple .R files to separate nodes for computation.

Let me know if you have questions.

PTB

-- 
Patrick Brandt
Political Science
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Personal site: http://www.utdallas.edu/~pbrandt
MSBVAR site: http://yule.utdallas.edu

On Sat, Aug 21, 2010 at 1:03 PM, Laura S <leslaura at gmail.com> wrote:
> Thank you Paul. I really appreciate your response! The scheduler is now set
> up such that each user (it is a small cluster) gets a maximum of 16
> processors at a time. I am using a Rocks Linux cluster.
>
> #### Making a series of similar, but modified .r files- suggested
> method(s)?:
>
> Any suggestions are much appreciated, given my new clunky (but it will work)
> method. I am looking for a way to make a series of similar, but slightly
> modified, .r files.
>
> My issue is automating making 320 .r files that change the for(i in 1:x) in
> my base .r file (as well as other elements, e.g., the load(...),
> setwd(...)). For smaller jobs running on a single computer with batch files,
> I have been manually changing the for(i in 1:x) line, etc..
>
> Why does this matter to me? I am planning on running a simulation experiment
> on the linux cluster as a serial job (for now it seems the quickest way to
> get things rolling on our cluster). Although not elegant, it has been
> suggested I make 320 .r files so qsub runs one .r file and then selects
> other jobs. Thus, the manual route I am currently using would take a very
> long time (given multiple runs of 320 .r files, given experimental
> replication).
>
> Thank you,
> Laura
>
> On Tue, Aug 10, 2010 at 9:57 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>
>> On Tue, Aug 10, 2010 at 10:15 AM, Laura S <leslaura at gmail.com> wrote:
>> > Dear all:
>> >
>> > I would appreciate any help you are willing to offer. I have a simulation
>> > program that runs serially. However, I would like to run the jobs in such
>> a
>> > way that when a simulation is finished another job can begin to run.  The
>> > simulations take different amounts of time, so it would be ideal to have
>> a
>> > way to communicate that jobs are done, and to initiate new jobs. The
>> linux
>> > cluster IT staff at my institution do not have much documentation or
>> > experience with running R jobs.  I am new to HPC, so my apologizes for
>> this
>> > potentially very basic inquiry.
>> >
>> > Thank you for your time and consideration,
>> > Laura
>> >
>>
>> You don't give us much to go on. What scheduler does your cluster use?
>> for example.
>>
>> Here's what I'd do. Write a shell script that runs all of the programs
>> one after the other.  Without knowing more about the scheduling scheme
>> on your cluster, I can't say exactly how I would go about it.
>>
>> If you have access to a BASH shell, for example, it should be as simple as
>>
>> #!/bin/bash
>>
>> R --vanilla -f yourRprogram1.R
>>
>> R --vanilla -f yourRprogram2.R
>>
>> =====================
>>
>> and so forth. If you rewrite the first line of your R code to use
>> Rscript or littler, then you don't even need to bother with the "R
>> --vanilla -f" part, as each R program will become self aware (and take
>> over the world, like in Terminator).
>>
>> If you run exactly the same R program over and over again, make a for loop.
>>
>> As long as you have the details worked out on each individual run of
>> the model, the rest of it is not even really a "cluster" problem. You
>> have to run one after the other.
>>
>> FYI, I've been uploading practical working examples for our Rocks
>> Linux cluster using the Torque/OpenPBS scheduling system.  Maybe some
>> will help you.
>>
>> http://pj.freefaculty.org/cgi-bin/mw/index.php?title=Cluster:Main
>>
>> I think I could work out an example of the sort you describe if you
>> tell us a bit more about how the separate simulation runs talk to each
>> other.
>>
>> Or, I should add, if the runs go one after the other, why don't you
>> put them all in 1 R program.  ??
>>
>> --
>> Paul E. Johnson
>> Professor, Political Science
>> 1541 Lilac Lane, Room 504
>> University of Kansas
>>
>
>
>
> --
> " Genius is the summed production of the many with the names of the few
> attached for easy recall, unfairly so to other scientists"
>
> - E. O. Wilson (The Diversity of Life)
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MVPLN-VAR-monthly.R
Type: application/octet-stream
Size: 5809 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20100822/ba5c97a9/attachment.obj>