[R] Simulation

Thu May 14 00:10:07 CEST 2009

On Wed, May 13, 2009 at 9:56 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> Barry Rowlingson wrote:

>    n = 1000
>    benchmark(columns=c('test', 'elapsed'), order=NULL,
>       'for'={ l = list(); for (i in 1:n) l[[i]] = rnorm(i, 0, 1) },
>       lapply=lapply(1:n, rnorm, 0, 1) )
>    #     test elapsed
>    # 1    for   9.855
>    # 2 lapply   8.923
>
>
>> Yes, you can probably vectorize this with lapply or something, but I
>> prefer clarity over concision when dealing with beginners...
>
> but where's the preferred clarity in the for loop solution?

 Seriously? You think:

 lapply(1:n, rnorm, 0, 1)

is 'clearer' than:

x=list()
for(i in 1:n){
  x[[i]]=rnorm(i,0,1)
}

for beginners?

 Firstly, using 'lapply' introduces a function (lapply) that doesn't
have an intuitive name. Also, it takes a function as an argument. The
concept of having a function as a parameter to another function is
something that a lot of programming beginners have trouble with -
unless they were brought up on LISP of course, and few of us are.

 I propose that the for-loop example is clearer to a larger population
than the lapply version. Plus it's only useful in that form if the
first parameter is the one you want to lapply over. If you want to
work over the third parameter, say, you then need:

 lapply(1:n,function(i){rnorm(100,0,i)})

 at which point you've introduced anonymous functions. The jump from:

 x[[i]] = rnorm(i,0,1)
to
 x[[i]] = rnorm(100,0,i)

is much less than the changes in the lapply version, where you have to
go 'oh hang on, lapply only works on the first argument, so you have
to write another function, but you can do that inline like this...'.

Okay, maybe my example is a little contrived, but I still think for a
beginners context it's important not to jump too many paradigms at a
time.

B