[R] Efficiency Question - Nested lapply or nested for loop

epowell EPowell1 at med.miami.edu
Mon Oct 11 15:52:49 CEST 2010

Thank you both for your advice.  I ended up implementing both solutions and
testing them on a real dataset of 10,000 rows and 50 inds.  The results are
very, very interesting.

For some context, the original two approaches, nested lapply and nested for
loops, performed at 1.501529 
and 1.458963 mins, respectively.  So the for loops were indeed a bit faster.  

Next, I tried the index solution to avoid doing the paste command each
iteration.  Strangely, this increased the time to 2.83 minutes.  Here's how
I implemented it:

# create array of column idx
v = vector(mode="character",length=nind*4)
for (i in (0:(nind-1))) {
  v[(i*4+1):(i*4+4)] = c(paste("G_hat_0_",i,sep=""),
v = match(v,names(data))

for (row in (1:nrow(data))) {
for (i in (0:(nind-1))) { 

	Gmax = which.max(c( data[row,v[i*4+1]],
				  data[row,v[i*4+3]] ))

	Gtru = data[row,v[i*4+4]] + 1	# add 1 to match Gmax range

	cmat[Gmax,Gtru] = cmat[Gmax,Gtru] + 1

DAVID: Was this what you had in mind?  I had trouble implementing the vector
of indices as you had done.  It generated a bunch of warnings.

By far the best solution was that offered by Gabor.  His technique finished
the job in a whopping 9.8 SECONDS.  It took me about 15 minutes to
understand what it was doing, but the lesson is one I will never forget.  I
must admit, it was a wickedly clever solution.

I implemented it virtually identically to Gabor's example.  The only
difference is that I used the 'v' vector to subset the data frame because in
reality the data has many other unrelated columns.

mat <- matrix(t(data[v]), 4)
table(Gmax = apply(mat[-4,], 2, which.max), Gtru = mat[4,] + 1)

View this message in context: http://r.789695.n4.nabble.com/Efficiency-Question-Nested-lapply-or-nested-for-loop-tp2968553p2989822.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list