[Rd] findInterval

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Sep 17 18:14:56 CEST 2024


>>>>> Gabor Grothendieck 
>>>>>     on Mon, 16 Sep 2024 11:21:55 -0400 writes:

    > Suppose we have `dat` shown below and we want to find the the `y` value
    > corresponding to the last value in `x` equal to the corresponding component
    > of `seek` and we wish to return an output the same length as `seek` using
    > `findInterval` to perform  the search.  This returns the correct result:

    > dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
    > 	     		y = c(37, 12, 19, 30, 6, 15),
    >                  seek = 1:6)

    > zero2na <- function(x) replace(x, x == 0, NA)

    > dat |>
    > transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |>
    > _$result
    > ## [1] NA 12 19 15 15 15

I'd write that as

    with(dat, y[ zero2na(findInterval(seek, x)) ] )

so I can read it with jumping hoops and stand on my head ...

    > Since `findInterval` returns an index it is natural that the next step be
    > to use the index and it is also common that we want a result that is the
    > same length as the input.

I think your example where x and y are of the same length
not typical.

Not that the design of   findInterval(x, vec, ..)  is indeed to always return
an index, but there isn't any "nomatch", but rather a
- "left of the leftmost", i.e.,  an x[i] < vec[1]  (as 'vec' must be
  sorted increasingly) or
- "right of rightmost"  , i.e.,  an x[i] > vec[length(vec)]

and these should give *different* results (and not both the
same).

I don't think 'nomatch' would improve the relatively clean  findInterval()
behavior.

There are  three logical switches  ... which allow   2^3
variants of which I now guess only 6  differ:

Here's some R code showing the possibilities:


(argsTF <- names(formals(findInterval))[-(1:2)]) # "rightmost.closed"  "all.inside" "left.open"       
FT <- c(FALSE, TRUE)
allFT <- as.matrix(expand.grid(rightmost.closed = FT,
                               all.inside       = FT,
                               left.open        = FT))
allFT
(cn <- substr(colnames(allFT), 1,1)) #  "r" "a" "l"

x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)

fiAll <- apply(allFT, 1, function(r.a.f)
    do.call(findInterval, c(list(x, v), as.list(r.a.f))))

cbind(x, fiAll) # has all info

## must find cool 'column names' for fiAll: construct from r.., a.., l.. = F / T
(cn1 <- apply(`dim<-`(c(".","|")[allFT+1L], dim(allFT)), 1, paste0, collapse=""))
##  "..." "|.." ".|." "||." "..|" "|.|" ".||" "|||"
colnames(fiAll) <- cn1
cbind(x, fiAll) ## --> col. 3 == 4  and  7 == 8
##==> show only unique columns:
cbind(x, t(unique(t(fiAll))))
 ##  x ... |.. .|. ..| |.| .||
 ##  2   0   0   1   0   0   1
 ##  3   0   0   1   0   0   1
 ##  4   0   0   1   0   0   1
 ##  5   1   1   1   0   1   1
 ##  6   1   1   1   1   1   1
 ##  7   1   1   1   1   1   1
 ##  8   1   1   1   1   1   1
 ##  9   1   1   1   1   1   1
 ## 10   2   2   2   1   1   1
 ## 11   2   2   2   2   2   2
 ## 12   2   2   2   2   2   2
 ## 13   2   2   2   2   2   2
 ## 14   2   2   2   2   2   2
 ## 15   3   2   2   2   2   2
 ## 16   3   3   2   3   3   2
 ## 17   3   3   2   3   3   2
 ## 18   3   3   2   3   3   2
  

Martin



More information about the R-devel mailing list