[R] how to use 'which' inside of 'apply'?

Mon Oct 17 20:00:00 CEST 2011

Try vectorizing it a bit by looping over the columns.
E.g.,

  f1 <- function (df)
  {
      # loop (backwards) over all columns in df whose
      # names start with "D" to find the earliest one
      # that is bigger than column "thold".  I tested with
      # df being a data.frame but a matrix should work too.
      i <- rep(NA_character_, nrow(df))
      colNames <- grep(value = TRUE, "^D", colnames(df))
      for (colName in rev(colNames)) {
          i[df[, colName] > df[, "thold"]] <- colName
      }
      # convert column name "D<number>" to <number>.
      doy <- as.numeric(sub("^D", "", i))
      doy
  }
  > f1(a)
  [1] 129 145 129 177 177 177

You could also try looping over rows with something like
findInterval.  If there are far fewer columns than rows
then looping over columns is generally faster.

Your sample data.frame I called 'a' and in copy-and-pastable
form (from dput()) is
a <- structure(list(pt = c(39177L, 39178L, 39164L, 39143L, 39144L,
39146L), D1 = c(0L, 0L, 0L, 0L, 0L, 0L), D17 = c(0L, 0L, 0L,
0L, 0L, 0L), D33 = c(0L, 0L, 0L, 0L, 0L, 0L), D49 = c(0L, 0L,
0L, 0L, 0L, 0L), D65 = c(0L, 0L, 0L, 0L, 0L, 0L), D81 = c(0L,
0L, 0L, 0L, 0L, 0L), D97 = c(0L, 0L, 0L, 0L, 0L, 0L), D113 = c(0L,
0L, 0L, 0L, 0L, 0L), D129 = c(0.4336, 0.342, 0.483, 0.3088, 0.339,
0.4232), D145 = c(0.4754, 0.4543, 0.4943, 0.3753, 0.4152, 0.4442
), D161 = c(0.5340667, 0.5397666, 0.5740333, 0.4466, 0.5147,
0.5084), D177 = c(0.5927334, 0.6252333, 0.6537667, 0.5179, 0.6142,
0.5726), D193 = c(0.6514, 0.7107, 0.7335, 0.5892, 0.7137, 0.6368
), D209 = c(0.6966, 0.7123, 0.6255, 0.6468, 0.6914, 0.5896),
    D225 = c(0.59, 0.5591, 0.6228, 0.4794, 0.6381, 0.4703), D241 = c(0.5583,
    0.4617, 0.5255, 0.4411, 0.5704, 0.4936), D257 = c(0.5676,
    0.4206, 0.5436, 0.4307, 0.5619, 0.5353), D273 = c(0.4682,
    0.3867, 0.5541, 0.3632, 0.5347, 0.4067), D289 = c(0.35115,
    0.2578, 0.46195, 0.34355, 0.4976, 0.39685), D305 = c(0.2341,
    0.1289, 0.3698, 0.3239, 0.4605, 0.387), D321 = c(0.11705,
    0, 0.1849, 0, 0, 0), D337 = c(0L, 0L, 0L, 0L, 0L, 0L), D353 = c(0L,
    0L, 0L, 0L, 0L, 0L), thold = c(0.406825, 0.4206, 0.4592,
    0.4778, 0.52635, 0.5119), doy = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("pt",
"D1", "D17", "D33", "D49", "D65", "D81", "D97", "D113", "D129",
"D145", "D161", "D177", "D193", "D209", "D225", "D241", "D257",
"D273", "D289", "D305", "D321", "D337", "D353", "thold", "doy"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6"))

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of R. Michael
> Weylandt
> Sent: Monday, October 17, 2011 10:32 AM
> To: Nathan Piekielek
> Cc: r-help at r-project.org
> Subject: Re: [R] how to use 'which' inside of 'apply'?
> 
> I think something like this should do it at a huge speed up, though
> I'd advise you check it to make sure it does exactly what you want:
> there's also nothing to guarantee that something beats the threshold,
> so that might make the whole thing fall apart (though I don't think it
> will)
> 
> # Sample data
> df = data.frame(x = sample(5, 15,T),
> 			y = sample(5, 15, T),
> 			z = sample(5, 15,T),
> 			w = (1:5)/2 + 0.5,
> 			th = (1:5)/2,
> 			doy = rep(0,15))
> 
> wd <- which(df[,1:4] > df[,5], arr.ind = TRUE)
> # identify all elements that beat the threshold value by their indices
> 
> wd <- wd[!duplicated(wd[,1]),]
> # select only the first appearance of each "row" value in wd -- this
> keeps the earliest column beating the threshold
> 
> wd <- wd[order(wd[,"row"]),]
> # sort them by row
> 
> df$doy = (wd[,"col"]-1)*16 + 1
> # The column transform you used.
> 
> Hope this helps,
> 
> Michael
> 
> 
> On Mon, Oct 17, 2011 at 1:03 PM, Nathan Piekielek <npiekielek at gmail.com> wrote:
> > Hello R-community,
> >
> > I am trying to populate a column (doy) in a large dataset with the first
> > column number that exceeds the value in another column (thold) using the
> > 'apply' function.
> >
> > Sample data:
> >     pt D1 D17 D33 D49 D65 D81 D97 D113   D129   D145      D161      D177
> > D193   D209   D225   D241   D257
> > 1 39177  0   0   0   0   0   0   0    0 0.4336 0.4754 0.5340667 0.5927334
> > 0.6514 0.6966 0.5900 0.5583 0.5676
> > 2 39178  0   0   0   0   0   0   0    0 0.3420 0.4543 0.5397666 0.6252333
> > 0.7107 0.7123 0.5591 0.4617 0.4206
> > 3 39164  0   0   0   0   0   0   0    0 0.4830 0.4943 0.5740333 0.6537667
> > 0.7335 0.6255 0.6228 0.5255 0.5436
> > 4 39143  0   0   0   0   0   0   0    0 0.3088 0.3753 0.4466000 0.5179000
> > 0.5892 0.6468 0.4794 0.4411 0.4307
> > 5 39144  0   0   0   0   0   0   0    0 0.3390 0.4152 0.5147000 0.6142000
> > 0.7137 0.6914 0.6381 0.5704 0.5619
> > 6 39146  0   0   0   0   0   0   0    0 0.4232 0.4442 0.5084000 0.5726000
> > 0.6368 0.5896 0.4703 0.4936 0.5353
> >    D273    D289   D305    D321 D337 D353    thold doy
> > 1 0.4682 0.35115 0.2341 0.11705    0    0 0.406825   0
> > 2 0.3867 0.25780 0.1289 0.00000    0    0 0.420600   0
> > 3 0.5541 0.46195 0.3698 0.18490    0    0 0.459200   0
> > 4 0.3632 0.34355 0.3239 0.00000    0    0 0.477800   0
> > 5 0.5347 0.49760 0.4605 0.00000    0    0 0.526350   0
> > 6 0.4067 0.39685 0.3870 0.00000    0    0 0.511900   0
> >
> > For the first record in above example I would expect doy = 129.
> >
> > I can achieve this with the following loop, but it takes several days to run
> > and there must be a more efficient solution:
> >
> > for (i in (1:152000)) {
> > t=which(data[i,2:24]>data[i,25])
> > r=min(t)
> > data[i,26]=(r-1)*16+1
> > }
> >
> > How do I write this using 'apply' or another function that will be more
> > efficient?
> >
> > I have tried the following:
> > data$doy=apply(which(data[,2:24]>data[,25]),1,min)
> >
> > Which returns the following error message:
> > "Error in apply(which(new[, 2:24] > new[, 25]), 1, min) :
> >  dim(X) must have a positive length"
> >
> > Any help would be much appreciated.
> >
> > Nathan
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.