[R] simulations with very large number of iterations (1 billion)

Viechtbauer Wolfgang (STAT) wolfgang.viechtbauer at maastrichtuniversity.nl
Fri Apr 15 10:41:06 CEST 2011


We do not know the details of the kinds of computations you intend to do within each iteration, but if, let's say, each iterations takes around 1 second, then your simulation will run for the next 30+ years (on a single core). Even if each iteration only takes a fraction of a second, you are still looking at years here. If you can parallelize things, you may be able to make this work within a realistic time frame, but this assumes access to dozens of cores.

Good luck!

Best,

--
Wolfgang Viechtbauer
Department of Psychiatry and Neuropsychology
School for Mental Health and Neuroscience
Maastricht University, P.O. Box 616
6200 MD Maastricht, The Netherlands
Tel: +31 (43) 368-5248
Fax: +31 (43) 368-8689
Web: http://www.wvbauer.com



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Brian J Mingus
Sent: Friday, April 15, 2011 08:29
To: Marion Dumas
Cc: r-help at r-project.org
Subject: Re: [R] simulations with very large number of iterations (1 billion)


On Thu, Apr 14, 2011 at 7:41 PM, Marion Dumas <mariouka at gmail.com> wrote:

> Hello R-help list
> I'm trying to run 1 billion iterations of a code with calls to random
> distributions to implement a data generating process and subsequent
> computation of various estimators that are recorded for further
> comparison of performance. I have two question about how to achieve
> this: 1. the most important: on my laptop, R gives me an error message
> saying that it cannot allocate sufficient space for the matrix that is
> meant to record the results (a 1 billion by 4 matrix). Is this
> computer-specific? Are there ways to circumvent this limit? Or is it
> hopeless to run 1 billion iterations in one batch? ( the alternative
> being to run, for example, 1000 iterations of a 1 million iteration
> process that spits out output files that can then be combined
> manually). 2. secondly: when I profile the code on a smaller number of
> iterations, it says that colSums is the function that has the longest
> self time. I am using this to compute  stratum-specific treatment
> effects. My thinking was that the fastest way to compute mean outcome
> conditional on treatment for each stratum would be to combine all
> strata in one matrix and apply colSums-type functions on it. Maybe I
> am wrong and there are better ways?
>
> Thank you in advance for any help you may provide.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


The first thing you need to do is estimate the amount of memory that is going to being needed. Then, estimate the amount of time it's going to take. You probably need a 64 bit computer and 4-8 GB of memory at least. You may not want to use R, insteading opting for C code and the GNU Scientific Library. If you can't write C code Lua is pretty easy to learn and GSL has been exposed through it in the GSL Shell: http://www.nongnu.org/gsl-shell/


--
Brian Mingus
Graduate student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list