[R-sig-hpc] Making a series of similar, but modified .r files- suggested method(s)? Re: Running jobs on a linux cluster

Patrick Brandt patrick.t.brandt at gmail.com
Mon Aug 23 02:36:25 CEST 2010

You can use R to dynamically generate the .R files and then process
them with Rmpi and snow.  See the examples on my replication page or
the following file that generates 3 .R files of data and related files
for an R + JAGS session.  Would post on the R-hpc list, but this is
not one I can provide as a reproducible example.  In what I do, I then
dispatch the multiple .R files to separate nodes for computation.

Let me know if you have questions.


Patrick Brandt
Political Science
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Personal site: http://www.utdallas.edu/~pbrandt
MSBVAR site: http://yule.utdallas.edu

On Sat, Aug 21, 2010 at 1:03 PM, Laura S <leslaura at gmail.com> wrote:
> Thank you Paul. I really appreciate your response! The scheduler is now set
> up such that each user (it is a small cluster) gets a maximum of 16
> processors at a time. I am using a Rocks Linux cluster.
> #### Making a series of similar, but modified .r files- suggested
> method(s)?:
> Any suggestions are much appreciated, given my new clunky (but it will work)
> method. I am looking for a way to make a series of similar, but slightly
> modified, .r files.
> My issue is automating making 320 .r files that change the for(i in 1:x) in
> my base .r file (as well as other elements, e.g., the load(...),
> setwd(...)). For smaller jobs running on a single computer with batch files,
> I have been manually changing the for(i in 1:x) line, etc..
> Why does this matter to me? I am planning on running a simulation experiment
> on the linux cluster as a serial job (for now it seems the quickest way to
> get things rolling on our cluster). Although not elegant, it has been
> suggested I make 320 .r files so qsub runs one .r file and then selects
> other jobs. Thus, the manual route I am currently using would take a very
> long time (given multiple runs of 320 .r files, given experimental
> replication).
> Thank you,
> Laura
> On Tue, Aug 10, 2010 at 9:57 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>> On Tue, Aug 10, 2010 at 10:15 AM, Laura S <leslaura at gmail.com> wrote:
>> > Dear all:
>> >
>> > I would appreciate any help you are willing to offer. I have a simulation
>> > program that runs serially. However, I would like to run the jobs in such
>> a
>> > way that when a simulation is finished another job can begin to run.  The
>> > simulations take different amounts of time, so it would be ideal to have
>> a
>> > way to communicate that jobs are done, and to initiate new jobs. The
>> linux
>> > cluster IT staff at my institution do not have much documentation or
>> > experience with running R jobs.  I am new to HPC, so my apologizes for
>> this
>> > potentially very basic inquiry.
>> >
>> > Thank you for your time and consideration,
>> > Laura
>> >
>> You don't give us much to go on. What scheduler does your cluster use?
>> for example.
>> Here's what I'd do. Write a shell script that runs all of the programs
>> one after the other.  Without knowing more about the scheduling scheme
>> on your cluster, I can't say exactly how I would go about it.
>> If you have access to a BASH shell, for example, it should be as simple as
>> #!/bin/bash
>> R --vanilla -f yourRprogram1.R
>> R --vanilla -f yourRprogram2.R
>> =====================
>> and so forth. If you rewrite the first line of your R code to use
>> Rscript or littler, then you don't even need to bother with the "R
>> --vanilla -f" part, as each R program will become self aware (and take
>> over the world, like in Terminator).
>> If you run exactly the same R program over and over again, make a for loop.
>> As long as you have the details worked out on each individual run of
>> the model, the rest of it is not even really a "cluster" problem. You
>> have to run one after the other.
>> FYI, I've been uploading practical working examples for our Rocks
>> Linux cluster using the Torque/OpenPBS scheduling system.  Maybe some
>> will help you.
>> http://pj.freefaculty.org/cgi-bin/mw/index.php?title=Cluster:Main
>> I think I could work out an example of the sort you describe if you
>> tell us a bit more about how the separate simulation runs talk to each
>> other.
>> Or, I should add, if the runs go one after the other, why don't you
>> put them all in 1 R program.  ??
>> --
>> Paul E. Johnson
>> Professor, Political Science
>> 1541 Lilac Lane, Room 504
>> University of Kansas
> --
> " Genius is the summed production of the many with the names of the few
> attached for easy recall, unfairly so to other scientists"
> - E. O. Wilson (The Diversity of Life)
>        [[alternative HTML version deleted]]
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MVPLN-VAR-monthly.R
Type: application/octet-stream
Size: 5809 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20100822/ba5c97a9/attachment.obj>

More information about the R-sig-hpc mailing list