The goal of *stratallo* package is to provide implementations
of the efficient algorithms that solve a classical problem in survey
methodology - an optimum sample allocation problem in stratified
sampling schemes. In this context, the classical problem of optimum
sample allocation is the Tschuprov-Neyman’s sense (Neyman 1934; Tschuprov 1923). It is formulated
as determination of a vector of strata sample sizes that minimizes the
variance of the \(\pi\)-estimator of
the population total of a given study variable, under constraint on
total sample size. This problem can be further complemented by adding
lower or upper bounds constraints on sample sizes is strata.

A minor modification of the classical optimium sample allocation problem leads to the minimum sample size allocation. This problem lies in the determination of a vector of strata sample sizes that minimizes total sample size, under assumed fixed level of the \(\pi\)-estimator’s variance. As in the case of the classical optimal allocation, the problem of minimum sample size allocation can be complemented by imposing upper bounds constraints on sample sizes in strata.

*Stratallo* provides two user functions, `dopt`

and
`nopt`

that solve sample allocation problems briefly
characterized above. In this context, it is assumed that the sampling
designs in strata are chosen so that the variance of the \(\pi\)-estimator of the population total is
of the following generic form: \[
D^2_{st}(x_w,\, w \in \mathcal W) = \sum_{w \in \mathcal W}\,
\frac{a_w^2}{x_w} - b,
\] where \(\mathcal W= \{1, \ldots,
H\}\) denotes set of strata labels with total number of strata
equals to \(H\), \((x_w)_{w \in \mathcal W}\) are the strata
sample sizes, and parameters \(b\), and
\(a_w > 0,\, w \in \mathcal W\), do
not depend on the \((x_w)_{w \in \mathcal
W}\). Among the stratified sampling designs that have the \(\pi\)-estimator’s variance of the above
form is stratified simple random sampling without replacement design.
Under this design \(a_w = N_w S_w,\, w \in
\mathcal W\) and \(b = \sum_{w \in
\mathcal W}\, N_w S_w^2\), where \(S_w,\, w \in \mathcal W\), denote stratum
standard deviations of study variable and \(N_w,\, w \in \mathcal W\), are the strata
sizes (see e.g. Sarndal et al. (1993),
Result 3.7.2, p. 103).

Apart from `dopt`

and `nopt`

,
*stratallo* provides `var_tst`

and
`var_tst_si`

functions that compute a value of variance \(D^2_{st}\). The `var_tst_si`

is
a simple wrapper of `var_tst`

that is dedicated for the case
of simple random sampling without replacement design in each stratum.
Furthermore, the package comes with two predefined, artificial
populations with 507 and 969 strata. These are stored in
`pop507`

and `pop969`

objects respectively.

`dopt`

functionThe `dopt`

function solves the following three types of
the allocation problem, formulated in the language of mathematical
optimization.

**Problem 1 (one-sided upper bounds constraints)**

Given numbers \(a_w > 0,\, M_w > 0,\, w
\in \mathcal W\) and \(b,\, n \le
\sum_{w \in \mathcal W}\, M_w\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad f(\mathbf x) = \sum_{w \in \mathcal W} \tfrac{a_w^2}{x_w} -
b \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W} x_w = n \\
& \quad x_w \le M_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.

**Problem 2 (one-sided lower bounds constraints)**

Given numbers \(a_w > 0,\, m_w > 0,\, w
\in \mathcal W\), and \(b,\, n \ge
\sum_{w \in \mathcal W} m_w\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad f(\mathbf x) = \sum_{w \in \mathcal W} \tfrac{a_w^2}{x_w} -
b \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W} x_w = n \\
& \quad x_w \ge m_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.

**Problem 3 (box-constraints)**

Given numbers \(a_w > 0,\, 0 < m_w <
M_w,\, w \in \mathcal W\), and \(b,\,
\sum_{w \in \mathcal W} m_w \le n \le \sum_{w \in \mathcal W}
M_w\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad f(\mathbf x) = \sum_{w \in \mathcal W} \tfrac{a_w^2}{x_w} -
b \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W} x_w = n \\
& \quad x_w \ge m_w, \quad \forall w \in \mathcal W, \\
& \quad x_w \le M_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.

User of `dopt`

can choose whether the solution computed
will be for **Problem 1**, **Problem 2** or
**Problem 3**. This is achieved with the proper use of
`m`

and `M`

arguments of the function. In case of
**Problem 1**, user provides the values of upper bounds
with `M`

argument, while leaving `m`

as
`NULL`

. Similarly, for **Problem 2**, user
provides the values of lower bounds with `m`

argument, while
leaving `M`

as `NULL`

. In case of **Problem
3**, both arguments `m`

and `M`

must be
specified. If both `m`

and `M`

are
`NULL`

(default), the `dopt`

returns the value of
Tschuprov-Neyman allocation that minimizes variance \(D^2_{st}\) under constraints on total
sample size \(\sum_{w \in \mathcal W} x_w =
n\), and it is given by \[
x_w = a_w \frac{n}{\sum_{w \in \mathcal W} a_w}, \quad w \in \mathcal
W
\] There are four different algorithms available to use for
**Problem 1**, *rna* (default), *sga*,
*sgaplus*, *coma*. All these algorithms, except
*sgaplus*, are described in detail in Wesołowski et al. (2021). The *sgaplus*
is defined in Wójciak (2019) as
*Sequential Allocation (version 1)* algorithm.

The optimization **Problem 2** is solved by the
*lrna* that in principle is based on the *rna* and it is
introduced in Wójciak (2022).

The optimization **Problem 3** is solved by the
*rnabox* which is a new algorithm proposed by the authors of this
package and it will be published soon.

`nopt`

functionThe `nopt`

function solves the following minimum sample
size allocation problem, formulated in the language of mathematical
optimization.

**Problem 4**

Given numbers \(a_w > 0,\, M_w > 0,\, w
\in \mathcal W\), and \(b,\, D >
\sum_{w \in \mathcal W} \tfrac{a_w^2}{M_w} - b > 0\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad n(\mathbf x) = \sum_{w \in \mathcal W} x_w \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W}
\tfrac{a_w^2}{x_w} - b = D \\
& \quad x_w \le M_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.

The algorithm that solves **Problem 4** is based on the
*lrna* and it is described in Wójciak
(2022).

You can install the released version of *stratallo* package
from CRAN with:

`install.packages("stratallo")`

These are basic examples that show how to use `dopt`

and
`nopt`

functions to solve optimal sample allocation problems
for an example population with 4 strata.

`library(stratallo)`

`dopt`

```
# Define example population.
<- c(3000, 4000, 5000, 2000) # Strata sizes.
N <- c(48, 79, 76, 17) # Standard deviations of a study variable in strata.
S <- N * S
a <- 190 # Total sample size. n
```

```
<- dopt(n = n, a = a)
opt
opt#> [1] 31.304348 68.695652 82.608696 7.391304
sum(opt) == n
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 3959066000
```

```
<- c(100, 90, 70, 80) # Upper bounds constraints imposed on the sample sizes in strata.
M all(M <= N)
#> [1] TRUE
< sum(M)
n #> [1] TRUE
# Solution to Problem 1.
<- dopt(n = n, a = a, M = M)
opt
opt#> [1] 34.979757 76.761134 70.000000 8.259109
sum(opt) == n
#> [1] TRUE
all(opt <= M) # Does not violate upper bounds constraints.
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 4035156476
```

```
<- c(50, 120, 1, 1) # Lower bounds constraints imposed on the sample sizes in strata.
m > sum(m)
n #> [1] TRUE
# Solution to Problem 2.
<- dopt(n = n, a = a, m = m)
opt
opt#> [1] 50.000000 120.000000 18.357488 1.642512
sum(opt) == n
#> [1] TRUE
all(opt >= m) # Does not violate lower bounds constraints.
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 9755319333
```

```
<- c(100, 90, 500, 50) # Lower bounds constraints imposed on sample sizes in strata.
m <- c(300, 400, 800, 90) # Upper bounds constraints imposed on sample sizes in strata.
M <- 1284
n > sum(m) && n < sum(M)
n #> [1] TRUE
# Optimal allocation under box-constraints.
<- dopt(n = n, a = a, m = m, M = M)
opt
opt#> [1] 228.1290 400.0000 602.0072 53.8638
sum(opt) == n
#> [1] TRUE
all(opt >= m & opt <= M) # Does not violate any lower or upper bounds constraints.
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 540527719
```

`nopt`

```
<- c(3000, 4000, 5000, 2000)
a <- 70000
b <- c(100, 90, 70, 80)
M <- 1e6 # Variance constraint.
D
<- nopt(D, a, b, M)
opt sum(opt)
#> [1] 183.1776
```

Neyman, J. (1934), “On the Two Different Aspects of the
Representative Method: The Method of Stratified Sampling and the Method
of Purposive Selection,” *Journal of the Royal Statistical
Society*, 97, 558–606.

Sarndal, C.-E., Swensson, B., and Wretman, J. (1993), *Model Assisted
Survey Sampling*, Springer.

Tschuprov, A. A. (1923), “On the Mathematical Expectation of the
Moments of Frequency Distributions in the Case of Correlated
Observations,” *Metron*, 2, 461–493, 646–683.

Wesołowski, J., Wieczorkowski, R., and Wójciak, W. (2021),
“Optimality of the Recursive Neyman Allocation,”
*Journal of Survey Statistics and Methodology*. https://doi.org/10.1093/jssam/smab018.
https://arxiv.org/abs/2105.14486.

Wójciak, W. (2019), “Optimal Allocation in Stratified Sampling
Schemes,” *MSc Thesis*, Warsaw University of Technology.
http://home.elka.pw.edu.pl/~wwojciak/msc_optimal_allocation.pdf.

Wójciak, W. (2022), “Minimum Sample Size Allocation in Stratified
Sampling Under Constraints on Variance and Strata Sample Sizes.”
https://doi.org/10.48550/arXiv.2204.04035.