[R] how to use 'which' inside of 'apply'?

Mon Oct 17 19:32:19 CEST 2011

I think something like this should do it at a huge speed up, though
I'd advise you check it to make sure it does exactly what you want:
there's also nothing to guarantee that something beats the threshold,
so that might make the whole thing fall apart (though I don't think it
will)

# Sample data
df = data.frame(x = sample(5, 15,T),
			y = sample(5, 15, T),
			z = sample(5, 15,T),
			w = (1:5)/2 + 0.5,
			th = (1:5)/2,
			doy = rep(0,15))

wd <- which(df[,1:4] > df[,5], arr.ind = TRUE)
# identify all elements that beat the threshold value by their indices

wd <- wd[!duplicated(wd[,1]),]
# select only the first appearance of each "row" value in wd -- this
keeps the earliest column beating the threshold

wd <- wd[order(wd[,"row"]),]
# sort them by row

df$doy = (wd[,"col"]-1)*16 + 1
# The column transform you used.

Hope this helps,

Michael

On Mon, Oct 17, 2011 at 1:03 PM, Nathan Piekielek <npiekielek at gmail.com> wrote:
> Hello R-community,
>
> I am trying to populate a column (doy) in a large dataset with the first
> column number that exceeds the value in another column (thold) using the
> 'apply' function.
>
> Sample data:
>     pt D1 D17 D33 D49 D65 D81 D97 D113   D129   D145      D161      D177
> D193   D209   D225   D241   D257
> 1 39177  0   0   0   0   0   0   0    0 0.4336 0.4754 0.5340667 0.5927334
> 0.6514 0.6966 0.5900 0.5583 0.5676
> 2 39178  0   0   0   0   0   0   0    0 0.3420 0.4543 0.5397666 0.6252333
> 0.7107 0.7123 0.5591 0.4617 0.4206
> 3 39164  0   0   0   0   0   0   0    0 0.4830 0.4943 0.5740333 0.6537667
> 0.7335 0.6255 0.6228 0.5255 0.5436
> 4 39143  0   0   0   0   0   0   0    0 0.3088 0.3753 0.4466000 0.5179000
> 0.5892 0.6468 0.4794 0.4411 0.4307
> 5 39144  0   0   0   0   0   0   0    0 0.3390 0.4152 0.5147000 0.6142000
> 0.7137 0.6914 0.6381 0.5704 0.5619
> 6 39146  0   0   0   0   0   0   0    0 0.4232 0.4442 0.5084000 0.5726000
> 0.6368 0.5896 0.4703 0.4936 0.5353
>    D273    D289   D305    D321 D337 D353    thold doy
> 1 0.4682 0.35115 0.2341 0.11705    0    0 0.406825   0
> 2 0.3867 0.25780 0.1289 0.00000    0    0 0.420600   0
> 3 0.5541 0.46195 0.3698 0.18490    0    0 0.459200   0
> 4 0.3632 0.34355 0.3239 0.00000    0    0 0.477800   0
> 5 0.5347 0.49760 0.4605 0.00000    0    0 0.526350   0
> 6 0.4067 0.39685 0.3870 0.00000    0    0 0.511900   0
>
> For the first record in above example I would expect doy = 129.
>
> I can achieve this with the following loop, but it takes several days to run
> and there must be a more efficient solution:
>
> for (i in (1:152000)) {
> t=which(data[i,2:24]>data[i,25])
> r=min(t)
> data[i,26]=(r-1)*16+1
> }
>
> How do I write this using 'apply' or another function that will be more
> efficient?
>
> I have tried the following:
> data$doy=apply(which(data[,2:24]>data[,25]),1,min)
>
> Which returns the following error message:
> "Error in apply(which(new[, 2:24] > new[, 25]), 1, min) :
>  dim(X) must have a positive length"
>
> Any help would be much appreciated.
>
> Nathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>