findInterval {base}  R Documentation 
Given a vector of nondecreasing breakpoints in vec
, find the
interval containing each element of x
; i.e., if
i < findInterval(x,v)
, for each index j
in x
v_{i_j} \le x_j < v_{i_j + 1}
where v_0 := \infty
,
v_{N+1} := +\infty
, and N < length(v)
.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed
and all.inside
.
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
left.open = FALSE)
x 
numeric. 
vec 
numeric, sorted (weakly) increasingly, of length 
rightmost.closed 
logical; if true, the rightmost interval,

all.inside 
logical; if true, the returned indices are coerced
into 
left.open 
logical; if true all the intervals are open at left
and closed at right; in the formulas below, 
The function findInterval
finds the index of one vector x
in
another, vec
, where the latter must be nondecreasing. Where
this is trivial, equivalent to apply( outer(x, vec, `>=`), 1, sum)
,
as a matter of fact, the internal algorithm uses interval search
ensuring O(n \log N)
complexity where
n < length(x)
(and N < length(vec)
). For (almost)
sorted x
, it will be even faster, basically O(n)
.
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X))
is
identical to n F_n(t; X_1,\dots,X_n)
where F_n
is the empirical distribution
function of X_1,\dots,X_n
.
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
( = \max vec
), is N  1
as for all other
values in the last interval.
left.open = TRUE
is occasionally useful, e.g., for survival data.
For (anti)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:
identical( findInterval( x, v, left.open= TRUE, ...) , N  findInterval(x, v[N:1], left.open=FALSE, ...) )
where N < length(vec)
as above.
vector of length length(x)
with values in 0:N
(and
NA
) where N < length(vec)
, or values coerced to
1:(N1)
if and only if all.inside = TRUE
(equivalently coercing all
x values inside the intervals). Note that NA
s are
propagated from x
, and Inf
values are allowed in
both x
and vec
.
Martin Maechler
approx(*, method = "constant")
which is a
generalization of findInterval()
, ecdf
for
computing the empirical distribution function which is (up to a factor
of n
) also basically the same as findInterval(.)
.
x < 2:18
v < c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))
N < 100
X < sort(round(stats::rt(N, df = 2), 2))
tt < c(100, seq(2, 2, length.out = 201), +100)
it < findInterval(tt, X)
tt[it < 1  it >= N] # only first and last are outside range(X)
## 'left.open = TRUE' means "mirroring" :
N < length(v)
stopifnot(identical(
findInterval( x, v, left.open=TRUE) ,
N  findInterval(x, v[N:1])))