[R] Need help optimizing/vectorizing nested loops

Tue Dec 9 18:38:32 CET 2008

On Tue, 9 Dec 2008, tyler wrote:

> Hi,
>
> I'm analyzing a large number of large simulation datasets, and I've
> isolated one of the bottlenecks. Any help in speeding it up would be
> appreciated.

Cast the neighborhoods as an indicator matrix, then use matrix 
multiplications:

>  system.time(tmp <- time.test(dat))
    user  system elapsed
    1.18    0.00    1.20
> system.time({
+ mn <- with(dat,outer(1:25,1:25, function(i,j) abs(X[i]-X[j])<2 & abs(Y[i]-Y[j])<2 & i!=j ))
+ print(all.equal(rowSums(mn%*%as.matrix(dat[,-(1:2)])>0),tmp))})
[1] TRUE
    user  system elapsed
       0       0       0

HTH,

Chuck

>
> `dat` is a dataframe of samples from a regular grid. The first two
> columns are the spatial coordinates of the samples, the remaining 20
> columns are the abundances of species in each cell. I need to calculate
> the species richness in adjacent cells for each cell in the sample.
> For example, if I have nine cells in my dataframe (X = 1:3, Y = 1:3):
>
>  a b c
>  d e f
>  g h i
>
> I need to calculate the neighbour-richness for each cell; for a, this is
> the richness of cells b, d and e combined. The neighbour richness of
> cell e would be the combined richness of all the other eight cells.
>
> The following code does what I what, but it's slow. The sample dataset
> 'dat', below, represents a 5x5 grid, 25 samples. It takes about 1.5
> seconds on my computer. The largest samples I am working with have a 51
> x 51 grid (2601 cells) and take 4.5 minutes. This is manageable, but
> since I have potentially hundreds of these analyses to run, trimming
> that down would be very helpful.
>
> After loading the function and the data, the call
>
>  system.time(tmp <- time.test(dat))
>
> Will run the code. Note that I've excised this from a larger, more
> general function, after determining that for large datasets this section
> is responsible for a slowdown from 10-12 seconds to ca. 250 seconds.
>
> Thanks for your patience,
>
> Tyler
>
>
> time.test <- function(dat) {
>
>  cen <- dat
>  grps <- 5
>  n.rich <- numeric(grps^2)
>  n.ind <- 1
>
>  for(i in 1:grps)
>    for (j in 1:grps) {
>      n.cen <- numeric(ncol(cen) - 2)
>      neighbours <- expand.grid((j-1):(j+1), (i-1):(i+1))
>      neighbours <- neighbours[-5,]
>      neighbours <- neighbours[which(neighbours[,1] %in% 1:grps &
>                                     neighbours[,2] %in% 1:grps),]
>
>      for (k in 1:nrow(neighbours))
>        n.cen <- n.cen + cen[cen$X == neighbours[k,1] &
>                             cen$Y == neighbours[k,2], -c(1:2)]
>
>      n.rich[n.ind] <- sum(as.logical(n.cen))
>      n.ind <- n.ind + 1
>    }
>
>    return(n.rich)
> }
>
> `dat` <- structure(list(
>  X = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1,
>  2, 3, 4, 5), Y = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4,
>  4, 4, 4, 4, 5, 5, 5, 5, 5), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 45L, 131L, 0L, 0L, 34L,
>  481L, 1744L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>  0L, 1L, 88L, 0L, 70L, 101L, 13L, 634L, 0L, 0L, 71L, 640L, 1636L), V3
>  = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 49L, 3L, 113L,
>  1L, 44L, 167L, 336L, 933L, 0L, 14L, 388L, 1180L, 1709L), V4 = c(0L,
>  0L, 0L, 0L, 0L, 0L, 3L, 12L, 0L, 0L, 2L, 1L, 36L, 45L, 208L, 7L,
>  221L, 213L, 371L, 1440L, 26L, 211L, 389L, 1382L, 1614L), V5 = c(96L,
>  7L, 0L, 0L, 0L, 10L, 17L, 0L, 5L, 0L, 0L, 11L, 151L, 127L, 160L,
>  27L, 388L, 439L, 1117L, 1571L, 81L, 598L, 1107L, 1402L, 891L), V6 =
>  c(16L, 30L, 13L, 0L, 0L, 10L, 195L, 60L, 29L, 29L, 1L, 107L, 698L,
>  596L, 655L, 227L, 287L, 677L, 1477L, 1336L, 425L, 873L, 961L, 1360L,
>  1175L), V7 = c(249L, 101L, 69L, 0L, 18L, 186L, 331L, 291L, 259L,
>  248L, 336L, 404L, 642L, 632L, 775L, 455L, 801L, 697L, 1063L, 978L,
>  626L, 686L, 1204L, 1138L, 627L), V8 = c(300L, 163L, 65L, 145L, 377L,
>  257L, 690L, 655L, 420L, 288L, 346L, 461L, 1276L, 897L, 633L, 812L,
>  1018L, 1337L, 1295L, 1163L, 550L, 1104L, 768L, 933L, 433L), V9 =
>  c(555L, 478L, 374L, 349L, 357L, 360L, 905L, 954L, 552L, 438L, 703L,
>  984L, 1616L, 1732L, 1234L, 1213L, 1518L, 1746L, 1191L, 967L, 1394L,
>  1722L, 1706L, 610L, 169L), V10 = c(1527L, 1019L, 926L, 401L, 830L,
>  833L, 931L, 816L, 1126L, 1232L, 1067L, 1169L, 1270L, 1277L, 1145L,
>  1159L, 1072L, 1534L, 997L, 391L, 1328L, 1414L, 1037L, 444L, 1L), V11
>  = c(1468L, 1329L, 1013L, 603L, 1096L, 1237L, 1488L, 1189L, 1064L,
>  1303L, 1258L, 1479L, 1421L, 1365L, 1101L, 1415L, 1145L, 1329L,
>  1325L, 236L, 1379L, 1199L, 729L, 328L, 0L), V12 = c(983L, 1459L,
>  791L, 898L, 911L, 1215L, 1528L, 960L, 1172L, 1286L, 1358L, 722L,
>  857L, 1478L, 1452L, 1502L, 1013L, 745L, 455L, 149L, 1686L, 917L,
>  1013L, 84L, 0L), V13 = c(1326L, 1336L, 1110L, 1737L, 1062L, 1578L,
>  1382L, 1537L, 1366L, 1308L, 1301L, 1357L, 746L, 622L, 934L, 1132L,
>  954L, 460L, 270L, 65L, 957L, 699L, 521L, 18L, 1L), V14 = c(1047L,
>  1315L, 1506L, 1562L, 1254L, 1336L, 1106L, 1213L, 1220L, 1457L, 858L,
>  1606L, 590L, 726L, 598L, 945L, 732L, 258L, 45L, 6L, 937L, 436L, 43L,
>  0L, 0L), V15 = c(845L, 935L, 1295L, 1077L, 1400L, 1049L, 802L,
>  1247L, 1449L, 1046L, 1134L, 877L, 327L, 352L, 470L, 564L, 461L,
>  166L, 0L, 0L, 230L, 110L, 29L, 0L, 0L), V16 = c(784L, 675L, 1157L,
>  1488L, 1511L, 1004L, 420L, 523L, 733L, 724L, 833L, 542L, 171L, 116L,
>  384L, 357L, 197L, 0L, 0L, 0L, 246L, 0L, 0L, 0L, 0L), V17 = c(444L,
>  873L, 530L, 596L, 448L, 431L, 109L, 446L, 378L, 243L, 284L, 148L,
>  69L, 30L, 6L, 71L, 32L, 131L, 0L, 0L, 120L, 0L, 0L, 0L, 0L), V18 =
>  c(307L, 128L, 823L, 566L, 383L, 454L, 62L, 90L, 99L, 261L, 351L,
>  94L, 81L, 0L, 37L, 113L, 28L, 0L, 0L, 0L, 15L, 17L, 0L, 0L, 0L), V19
>  = c(53L, 138L, 207L, 451L, 282L, 40L, 31L, 7L, 128L, 137L, 166L,
>  38L, 0L, 1L, 0L, 0L, 19L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V20 =
>  c(0L, 14L, 121L, 127L, 71L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 7L,
>  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("X", "Y", "V1",
>  "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
>  "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"), row.names =
>  c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 15L, 16L, 17L,
>  19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L), class =
>  "data.frame")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901