[R] Speeding up code

MacQueen, Don macqueen1 at llnl.gov
Mon Nov 25 19:45:46 CET 2013


ditto to everything Jeff Newmiller said, but I'll take it a little further.

I'm guessing that with
   df <- data.frame(31790,31790)
you thought you were creating something with 31790 rows and 31790 columns.
You weren't. You were creating a data frame with one row and two columns:

> data.frame(31790,31790)
  X31790 X31790.1
1  31790    31790

Given that in your loop you assign values to df[i,j],
and having started with just one row and two columns, it follows
that every time you assign to df[i,j] you are increasing
the size of your data frame, and that will slow things down.

Initialize with a matrix (I'll call it 'res' instead of 'df'):

  res <- matrix(NA, 31790,31790)

Then inside your loop, you can use
  

   if (dis2<=500) res[i,j] <- ken

No need to deal with 'else', since the matrix is initialized
with NA.

The ifelse() function was a less than ideal choice,
since it is designed for vector arguments, and your value, dis2,
appears to always have length = 1. You could have used
  df[i,j] <- if (dis2 <= 500) ken else NA
but as I mentioned above, if you initialize to NA there's no need
handle the 'else' case inside the loop.

It may be possible to vectorize your loop, but I kind of doubt it,
considering that you're using the cor() followed by the deg.dist()
function at every iteration.

However, you could calculate the dis2 value first, and then calculate
ken only when dis2 is <= 500. You're calculating ken even when it's not
needed. Avoiding that should speed things up.

I don't know what deg.dist() is doing, but if it is calculating distances
between points, there are functions for doing that on whole bunches
of points at once. Perhaps your data could be rearranged to work
with one of those.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/23/13 1:39 PM, "Amie Hunter" <amie_hunter at hotmail.com> wrote:

>Hello R experts, 
>
>I'm new to R and I'm wanting to know what is the best way to speed up my
>code. I've read that you can vectorize the code but I'm unsure on how to
>implement this into my code.
>
>
>df <- data.frame(31790,31790)
>
>for (i in 1:31790)
>{
>  for (j in i:31790)
>  {
>    ken<-cor(cldm[i,3:17],cldm[j,3:17], method="kendall", use="pairwise")
>    dis2<-deg.dist(cldm[i,2],cldm[i,1],cldm[j,2],cldm[j,1])
>    
>    df[i,j]<-ifelse(dis2<=500,ken,NA)
>    }
>  } 
>df
>
>Thanks! 		 	   		 
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list