[Rd] findInterval
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Sep 17 18:14:56 CEST 2024
>>>>> Gabor Grothendieck
>>>>> on Mon, 16 Sep 2024 11:21:55 -0400 writes:
> Suppose we have `dat` shown below and we want to find the the `y` value
> corresponding to the last value in `x` equal to the corresponding component
> of `seek` and we wish to return an output the same length as `seek` using
> `findInterval` to perform the search. This returns the correct result:
> dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
> y = c(37, 12, 19, 30, 6, 15),
> seek = 1:6)
> zero2na <- function(x) replace(x, x == 0, NA)
> dat |>
> transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |>
> _$result
> ## [1] NA 12 19 15 15 15
I'd write that as
with(dat, y[ zero2na(findInterval(seek, x)) ] )
so I can read it with jumping hoops and stand on my head ...
> Since `findInterval` returns an index it is natural that the next step be
> to use the index and it is also common that we want a result that is the
> same length as the input.
I think your example where x and y are of the same length
not typical.
Not that the design of findInterval(x, vec, ..) is indeed to always return
an index, but there isn't any "nomatch", but rather a
- "left of the leftmost", i.e., an x[i] < vec[1] (as 'vec' must be
sorted increasingly) or
- "right of rightmost" , i.e., an x[i] > vec[length(vec)]
and these should give *different* results (and not both the
same).
I don't think 'nomatch' would improve the relatively clean findInterval()
behavior.
There are three logical switches ... which allow 2^3
variants of which I now guess only 6 differ:
Here's some R code showing the possibilities:
(argsTF <- names(formals(findInterval))[-(1:2)]) # "rightmost.closed" "all.inside" "left.open"
FT <- c(FALSE, TRUE)
allFT <- as.matrix(expand.grid(rightmost.closed = FT,
all.inside = FT,
left.open = FT))
allFT
(cn <- substr(colnames(allFT), 1,1)) # "r" "a" "l"
x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
fiAll <- apply(allFT, 1, function(r.a.f)
do.call(findInterval, c(list(x, v), as.list(r.a.f))))
cbind(x, fiAll) # has all info
## must find cool 'column names' for fiAll: construct from r.., a.., l.. = F / T
(cn1 <- apply(`dim<-`(c(".","|")[allFT+1L], dim(allFT)), 1, paste0, collapse=""))
## "..." "|.." ".|." "||." "..|" "|.|" ".||" "|||"
colnames(fiAll) <- cn1
cbind(x, fiAll) ## --> col. 3 == 4 and 7 == 8
##==> show only unique columns:
cbind(x, t(unique(t(fiAll))))
## x ... |.. .|. ..| |.| .||
## 2 0 0 1 0 0 1
## 3 0 0 1 0 0 1
## 4 0 0 1 0 0 1
## 5 1 1 1 0 1 1
## 6 1 1 1 1 1 1
## 7 1 1 1 1 1 1
## 8 1 1 1 1 1 1
## 9 1 1 1 1 1 1
## 10 2 2 2 1 1 1
## 11 2 2 2 2 2 2
## 12 2 2 2 2 2 2
## 13 2 2 2 2 2 2
## 14 2 2 2 2 2 2
## 15 3 2 2 2 2 2
## 16 3 3 2 3 3 2
## 17 3 3 2 3 3 2
## 18 3 3 2 3 3 2
Martin
More information about the R-devel
mailing list