[R] Need help optimizing/vectorizing nested loops

Tue Dec 9 14:59:51 CET 2008

Hi,

I'm analyzing a large number of large simulation datasets, and I've
isolated one of the bottlenecks. Any help in speeding it up would be
appreciated.

`dat` is a dataframe of samples from a regular grid. The first two
columns are the spatial coordinates of the samples, the remaining 20
columns are the abundances of species in each cell. I need to calculate
the species richness in adjacent cells for each cell in the sample. 
For example, if I have nine cells in my dataframe (X = 1:3, Y = 1:3):

  a b c
  d e f
  g h i

I need to calculate the neighbour-richness for each cell; for a, this is
the richness of cells b, d and e combined. The neighbour richness of
cell e would be the combined richness of all the other eight cells.

The following code does what I what, but it's slow. The sample dataset
'dat', below, represents a 5x5 grid, 25 samples. It takes about 1.5
seconds on my computer. The largest samples I am working with have a 51
x 51 grid (2601 cells) and take 4.5 minutes. This is manageable, but
since I have potentially hundreds of these analyses to run, trimming
that down would be very helpful.

After loading the function and the data, the call

  system.time(tmp <- time.test(dat))

Will run the code. Note that I've excised this from a larger, more
general function, after determining that for large datasets this section
is responsible for a slowdown from 10-12 seconds to ca. 250 seconds.

Thanks for your patience,

Tyler

time.test <- function(dat) {

  cen <- dat
  grps <- 5
  n.rich <- numeric(grps^2)
  n.ind <- 1

  for(i in 1:grps)
    for (j in 1:grps) {
      n.cen <- numeric(ncol(cen) - 2)
      neighbours <- expand.grid((j-1):(j+1), (i-1):(i+1)) 
      neighbours <- neighbours[-5,] 
      neighbours <- neighbours[which(neighbours[,1] %in% 1:grps &
                                     neighbours[,2] %in% 1:grps),]

      for (k in 1:nrow(neighbours))
        n.cen <- n.cen + cen[cen$X == neighbours[k,1] &
                             cen$Y == neighbours[k,2], -c(1:2)]

      n.rich[n.ind] <- sum(as.logical(n.cen))
      n.ind <- n.ind + 1
    }

    return(n.rich)
}

`dat` <- structure(list(
  X = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1,
  2, 3, 4, 5), Y = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4,
  4, 4, 4, 4, 5, 5, 5, 5, 5), V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 45L, 131L, 0L, 0L, 34L,
  481L, 1744L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 1L, 88L, 0L, 70L, 101L, 13L, 634L, 0L, 0L, 71L, 640L, 1636L), V3
  = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 49L, 3L, 113L,
  1L, 44L, 167L, 336L, 933L, 0L, 14L, 388L, 1180L, 1709L), V4 = c(0L,
  0L, 0L, 0L, 0L, 0L, 3L, 12L, 0L, 0L, 2L, 1L, 36L, 45L, 208L, 7L,
  221L, 213L, 371L, 1440L, 26L, 211L, 389L, 1382L, 1614L), V5 = c(96L,
  7L, 0L, 0L, 0L, 10L, 17L, 0L, 5L, 0L, 0L, 11L, 151L, 127L, 160L,
  27L, 388L, 439L, 1117L, 1571L, 81L, 598L, 1107L, 1402L, 891L), V6 =
  c(16L, 30L, 13L, 0L, 0L, 10L, 195L, 60L, 29L, 29L, 1L, 107L, 698L,
  596L, 655L, 227L, 287L, 677L, 1477L, 1336L, 425L, 873L, 961L, 1360L,
  1175L), V7 = c(249L, 101L, 69L, 0L, 18L, 186L, 331L, 291L, 259L,
  248L, 336L, 404L, 642L, 632L, 775L, 455L, 801L, 697L, 1063L, 978L,
  626L, 686L, 1204L, 1138L, 627L), V8 = c(300L, 163L, 65L, 145L, 377L,
  257L, 690L, 655L, 420L, 288L, 346L, 461L, 1276L, 897L, 633L, 812L,
  1018L, 1337L, 1295L, 1163L, 550L, 1104L, 768L, 933L, 433L), V9 =
  c(555L, 478L, 374L, 349L, 357L, 360L, 905L, 954L, 552L, 438L, 703L,
  984L, 1616L, 1732L, 1234L, 1213L, 1518L, 1746L, 1191L, 967L, 1394L,
  1722L, 1706L, 610L, 169L), V10 = c(1527L, 1019L, 926L, 401L, 830L,
  833L, 931L, 816L, 1126L, 1232L, 1067L, 1169L, 1270L, 1277L, 1145L,
  1159L, 1072L, 1534L, 997L, 391L, 1328L, 1414L, 1037L, 444L, 1L), V11
  = c(1468L, 1329L, 1013L, 603L, 1096L, 1237L, 1488L, 1189L, 1064L,
  1303L, 1258L, 1479L, 1421L, 1365L, 1101L, 1415L, 1145L, 1329L,
  1325L, 236L, 1379L, 1199L, 729L, 328L, 0L), V12 = c(983L, 1459L,
  791L, 898L, 911L, 1215L, 1528L, 960L, 1172L, 1286L, 1358L, 722L,
  857L, 1478L, 1452L, 1502L, 1013L, 745L, 455L, 149L, 1686L, 917L,
  1013L, 84L, 0L), V13 = c(1326L, 1336L, 1110L, 1737L, 1062L, 1578L,
  1382L, 1537L, 1366L, 1308L, 1301L, 1357L, 746L, 622L, 934L, 1132L,
  954L, 460L, 270L, 65L, 957L, 699L, 521L, 18L, 1L), V14 = c(1047L,
  1315L, 1506L, 1562L, 1254L, 1336L, 1106L, 1213L, 1220L, 1457L, 858L,
  1606L, 590L, 726L, 598L, 945L, 732L, 258L, 45L, 6L, 937L, 436L, 43L,
  0L, 0L), V15 = c(845L, 935L, 1295L, 1077L, 1400L, 1049L, 802L,
  1247L, 1449L, 1046L, 1134L, 877L, 327L, 352L, 470L, 564L, 461L,
  166L, 0L, 0L, 230L, 110L, 29L, 0L, 0L), V16 = c(784L, 675L, 1157L,
  1488L, 1511L, 1004L, 420L, 523L, 733L, 724L, 833L, 542L, 171L, 116L,
  384L, 357L, 197L, 0L, 0L, 0L, 246L, 0L, 0L, 0L, 0L), V17 = c(444L,
  873L, 530L, 596L, 448L, 431L, 109L, 446L, 378L, 243L, 284L, 148L,
  69L, 30L, 6L, 71L, 32L, 131L, 0L, 0L, 120L, 0L, 0L, 0L, 0L), V18 =
  c(307L, 128L, 823L, 566L, 383L, 454L, 62L, 90L, 99L, 261L, 351L,
  94L, 81L, 0L, 37L, 113L, 28L, 0L, 0L, 0L, 15L, 17L, 0L, 0L, 0L), V19
  = c(53L, 138L, 207L, 451L, 282L, 40L, 31L, 7L, 128L, 137L, 166L,
  38L, 0L, 1L, 0L, 0L, 19L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V20 =
  c(0L, 14L, 121L, 127L, 71L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 7L,
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("X", "Y", "V1",
  "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
  "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"), row.names =
  c(1L, 2L, 3L, 4L, 5L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 15L, 16L, 17L,
  19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L), class =
  "data.frame")