[Rd] findInterval
Gabor Grothendieck
ggrothend|eck @end|ng |rom gm@||@com
Tue Sep 17 19:27:35 CEST 2024
The other problem in this example is setting NA's.
replace(x, x == 0, NA)
requires two instances of x making it not very pipe friendly. In
dplyr there is na_if
to address that problem and base R might have something that addresses this
so we don't have to define our own zero2na as the base of R now has pipes.
On Tue, Sep 17, 2024 at 12:14 PM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Gabor Grothendieck
> >>>>> on Mon, 16 Sep 2024 11:21:55 -0400 writes:
>
> > Suppose we have `dat` shown below and we want to find the the `y` value
> > corresponding to the last value in `x` equal to the corresponding component
> > of `seek` and we wish to return an output the same length as `seek` using
> > `findInterval` to perform the search. This returns the correct result:
>
> > dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
> > y = c(37, 12, 19, 30, 6, 15),
> > seek = 1:6)
>
> > zero2na <- function(x) replace(x, x == 0, NA)
>
> > dat |>
> > transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |>
> > _$result
> > ## [1] NA 12 19 15 15 15
>
> I'd write that as
>
> with(dat, y[ zero2na(findInterval(seek, x)) ] )
>
> so I can read it with jumping hoops and stand on my head ...
>
> > Since `findInterval` returns an index it is natural that the next step be
> > to use the index and it is also common that we want a result that is the
> > same length as the input.
>
> I think your example where x and y are of the same length
> not typical.
>
> Not that the design of findInterval(x, vec, ..) is indeed to always return
> an index, but there isn't any "nomatch", but rather a
> - "left of the leftmost", i.e., an x[i] < vec[1] (as 'vec' must be
> sorted increasingly) or
> - "right of rightmost" , i.e., an x[i] > vec[length(vec)]
>
> and these should give *different* results (and not both the
> same).
>
> I don't think 'nomatch' would improve the relatively clean findInterval()
> behavior.
>
> There are three logical switches ... which allow 2^3
> variants of which I now guess only 6 differ:
>
> Here's some R code showing the possibilities:
>
>
> (argsTF <- names(formals(findInterval))[-(1:2)]) # "rightmost.closed" "all.inside" "left.open"
> FT <- c(FALSE, TRUE)
> allFT <- as.matrix(expand.grid(rightmost.closed = FT,
> all.inside = FT,
> left.open = FT))
> allFT
> (cn <- substr(colnames(allFT), 1,1)) # "r" "a" "l"
>
> x <- 2:18
> v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
>
> fiAll <- apply(allFT, 1, function(r.a.f)
> do.call(findInterval, c(list(x, v), as.list(r.a.f))))
>
> cbind(x, fiAll) # has all info
>
> ## must find cool 'column names' for fiAll: construct from r.., a.., l.. = F / T
> (cn1 <- apply(`dim<-`(c(".","|")[allFT+1L], dim(allFT)), 1, paste0, collapse=""))
> ## "..." "|.." ".|." "||." "..|" "|.|" ".||" "|||"
> colnames(fiAll) <- cn1
> cbind(x, fiAll) ## --> col. 3 == 4 and 7 == 8
> ##==> show only unique columns:
> cbind(x, t(unique(t(fiAll))))
> ## x ... |.. .|. ..| |.| .||
> ## 2 0 0 1 0 0 1
> ## 3 0 0 1 0 0 1
> ## 4 0 0 1 0 0 1
> ## 5 1 1 1 0 1 1
> ## 6 1 1 1 1 1 1
> ## 7 1 1 1 1 1 1
> ## 8 1 1 1 1 1 1
> ## 9 1 1 1 1 1 1
> ## 10 2 2 2 1 1 1
> ## 11 2 2 2 2 2 2
> ## 12 2 2 2 2 2 2
> ## 13 2 2 2 2 2 2
> ## 14 2 2 2 2 2 2
> ## 15 3 2 2 2 2 2
> ## 16 3 3 2 3 3 2
> ## 17 3 3 2 3 3 2
> ## 18 3 3 2 3 3 2
>
>
> Martin
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-devel
mailing list