Error estimation

For the most part, this document will present the functionalities of the function surveysd::calc.stError() which generates point estimates and standard errors for user-supplied estimation functions.

Prerequisites

In order to use a dataset with calc.stError(), several weight columns have to be present. Each weight column corresponds to a bootstrap sample. In the following examples, we will use the data from demo.eusilc() and attach the bootstrap weights using draw.bootstrap() and recalib(). Please refer to the documentation of those functions for more detail.

library(surveysd)

set.seed(1234)
eusilc <- demo.eusilc(prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
                           strata = "region", period = "year")
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
                          epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[, onePerson := nrow(.SD) == 1, by = .(year, hid)]

## print part of the dataset
dat_boot_calib[1:5, .(year, povertyRisk, eqIncome, onePerson, pWeight, w1, w2, w3, w4, w5)]

year	povertyRisk	eqIncome	onePerson	pWeight	w1	w2	w3	w4	w5
2010	FALSE	16090.69	FALSE	504.5696	1013.1805463	0.4502254	1001.5595	1015.8425	0.4456781
2010	FALSE	16090.69	FALSE	504.5696	1013.1805463	0.4502254	1001.5595	1015.8425	0.4456781
2010	FALSE	16090.69	FALSE	504.5696	1013.1805463	0.4502254	1001.5595	1015.8425	0.4456781
2010	FALSE	27076.24	FALSE	493.3824	0.4413742	0.4409086	975.1408	994.4018	979.7081838
2010	FALSE	27076.24	FALSE	493.3824	0.4413742	0.4409086	975.1408	994.4018	979.7081838

Estimator functions

The parameters fun and var in calc.stError() define the estimator to be used in the error analysis. There are two built-in estimator functions weightedSum() and weightedRatio() which can be used as follows.

povertyRate <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio)
totalIncome <- calc.stError(dat_boot_calib, var = "eqIncome", fun = weightedSum)

Those functions calculate the ratio of persons at risk of poverty (in percent) and the total income. By default, the results are calculated separately for each reference period.

povertyRate$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2010	14827	8182222	direct	14.44422	0.3755538
2011	14827	8182222	direct	14.77393	0.2298196
2012	14827	8182222	direct	15.04515	0.2056325
2013	14827	8182222	direct	14.89013	0.4515894
2014	14827	8182222	direct	15.14556	0.4954098
2015	14827	8182222	direct	15.53640	0.5456595
2016	14827	8182222	direct	15.08315	0.5211549
2017	14827	8182222	direct	15.42019	0.3757101

totalIncome$Estimates

year	n	N	estimate_type	val_eqIncome	stE_eqIncome
2010	14827	8182222	direct	162750998071	904175758
2011	14827	8182222	direct	161926931417	1229058265
2012	14827	8182222	direct	162576509628	1903487229
2013	14827	8182222	direct	163199507862	1624805090
2014	14827	8182222	direct	163986275009	1464839665
2015	14827	8182222	direct	163416275447	1665569708
2016	14827	8182222	direct	162706205137	2073914048
2017	14827	8182222	direct	164314959107	2030896610

Columns that use the val_ prefix denote the point estimate belonging to the “main weight” of the dataset, which is pWeight in case of the dataset used here.

Columns with the stE_ prefix denote standard errors calculated with bootstrap replicates. The replicates result in using w1, w2, …, w10 instead of pWeight when applying the estimator.

n denotes the number of observations for the year and N denotes the total weight of those persons.

Custom estimators

In order to define a custom estimator function to be used in fun, the function needs to have at least two arguments like the example below.

## define custom estimator
myWeightedSum <- function(x, w) {
  sum(x*w)
}

## check if results are equal to the one using `surveysd::weightedSum()`
totalIncome2 <- calc.stError(dat_boot_calib, var = "eqIncome", fun = myWeightedSum)
all.equal(totalIncome$Estimates, totalIncome2$Estimates)

## [1] TRUE

The parameters x and w can be assumed to be vectors with equal length with w being numeric weight vector and x being the column defined in the var argument. It will be called once for each period (in this case year) and for each weight column (in this case pWeight, w1, w2, …, w10).

Custom estimators using additional parameters can also be supplied and parameter add.arg can be used to set the additional arguments for the custom estimator.

## use add.arg-argument
fun <- function(x, w, b) {
  sum(x*w*b)
}
add.arg = list(b="onePerson")

err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = fun,
                        period.mean = 0, add.arg=add.arg)
err.est$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2010	14827	8182222	direct	273683.9	14449.17
2011	14827	8182222	direct	261883.6	12029.86
2012	14827	8182222	direct	243083.9	13071.31
2013	14827	8182222	direct	238004.4	15764.40
2014	14827	8182222	direct	218572.1	16665.11
2015	14827	8182222	direct	219984.1	18322.78
2016	14827	8182222	direct	201753.9	14075.25
2017	14827	8182222	direct	196881.2	13604.54

# compare with direct computation
compare.value <- dat_boot_calib[,fun(povertyRisk,pWeight,b=onePerson),
                                 by=c("year")]
all((compare.value$V1-err.est$Estimates$val_povertyRisk)==0)

## [1] TRUE

The above chunk computes the weighted poverty ratio for single person households.

Adjust variable depending on bootstrap weights

In our example the variable povertyRisk is a boolean and is TRUE if the income is less than 60% of the weighted median income. Thus it directly depends on the original weight vector pWeight. To further reduce the estimated error one should calculate for each bootstrap replicate weight $w$ the weighted median income $medIncome_{w}$ and then define $povertyRisk_w$ as

\[ povertyRisk_w = \cases{1 \quad\text{if Income}<0.6\cdot medIncome_{w}\\ 0 \quad\text{else}} \]

The estimator can then be applied to the new variable $povertyRisk_w$. This can be realized using a custom estimator function.

# custom estimator to first derive poverty threshold 
# and then estimate a weighted ratio
povmd <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 # weighted ratio is directly estimated inside the function
 return(sum(w[pmd60])/sum(w)*100)
}

err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio,
  fun.adjust.var = povmd, adjust.var = "eqIncome")
err.est$Estimates

year	n	N	estimate_type	val_povertyRisk
2010	14827	8182222	direct	14.44422
2011	14827	8182222	direct	14.77393
2012	14827	8182222	direct	15.04515
2013	14827	8182222	direct	14.89013
2014	14827	8182222	direct	15.14556
2015	14827	8182222	direct	15.53640
2016	14827	8182222	direct	15.08315
2017	14827	8182222	direct	15.42019

The approach shown above is only valid if no grouping variables are supplied (parameter group = NULL). If grouping variables are supplied one should use parameters fun.adjust.var and adjust.var such that the $povertyRisk_w$ is first calculated for each period and then used for each grouping in group.

# using fun.adjust.var and adjust.var to estimate povmd60 indicator
# for each period and bootstrap weight before applying the weightedRatio
povmd2 <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 return(as.integer(pmd60))
}

# set adjust.var="eqIncome" so the income vector is used to estimate
# the povmd60 indicator for each bootstrap weight
# and the resulting indicators are passed to function weightedRatio
group <- "gender"
err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender",
  fun.adjust.var = povmd2, adjust.var = "eqIncome")
err.est$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2010	7267	3979572	male	direct	12.02660	0.4858507
2010	7560	4202650	female	direct	16.73351	0.6959347
2010	14827	8182222	NA	direct	14.44422	0.5756880
2011	7267	3979572	male	direct	12.81921	0.2873416
2011	7560	4202650	female	direct	16.62488	0.3743578
2011	14827	8182222	NA	direct	14.77393	0.2694827
2012	7267	3979572	male	direct	13.76065	0.2865017
2012	7560	4202650	female	direct	16.26147	0.2689458
2012	14827	8182222	NA	direct	15.04515	0.1903772
2013	7267	3979572	male	direct	13.88962	0.4730442
2013	7560	4202650	female	direct	15.83754	0.1908739
2013	14827	8182222	NA	direct	14.89013	0.3074631
2014	7267	3979572	male	direct	14.50351	0.5042843
2014	7560	4202650	female	direct	15.75353	0.3463626
2014	14827	8182222	NA	direct	15.14556	0.3709321
2015	7267	3979572	male	direct	15.12289	0.6285688
2015	7560	4202650	female	direct	15.92796	0.4200607
2015	14827	8182222	NA	direct	15.53640	0.4914012
2016	7267	3979572	male	direct	14.57968	0.5546359
2016	7560	4202650	female	direct	15.55989	0.3072535
2016	14827	8182222	NA	direct	15.08315	0.4023717
2017	7267	3979572	male	direct	14.94816	0.4973673
2017	7560	4202650	female	direct	15.86717	0.6738396
2017	14827	8182222	NA	direct	15.42019	0.5689435

Multiple estimators

In case an estimator should be applied to several columns of the dataset, var can be set to a vector containing all necessary columns.

multipleRates <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio)
multipleRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk	val_onePerson	stE_onePerson
2010	14827	8182222	direct	14.44422	0.3942534	14.85737	0.3942534
2011	14827	8182222	direct	14.77393	0.3043969	14.85737	0.3043969
2012	14827	8182222	direct	15.04515	0.2895304	14.85737	0.2895304
2013	14827	8182222	direct	14.89013	0.3950952	14.85737	0.3950952
2014	14827	8182222	direct	15.14556	0.4561354	14.85737	0.4561354
2015	14827	8182222	direct	15.53640	0.6039997	14.85737	0.6039997
2016	14827	8182222	direct	15.08315	0.5295194	14.85737	0.5295194
2017	14827	8182222	direct	15.42019	0.6276176	14.85737	0.6276176

Here we see the relative number of persons at risk of poverty and the relative number of one-person households.

Grouping

The groups argument can be used to calculate estimators for different subsets of the data. This argument can take the grouping variable as a string that refers to a column name (usually a factor) in dat. If set, all estimators are not only split by the reference period but also by the grouping variable. For simplicity, only one reference period of the above data is used.

dat2 <- subset(dat_boot_calib, year == 2010)
for (att  in c("period", "weights", "b.rep"))
  attr(dat2, att) <- attr(dat_boot_calib, att)

To calculate the ratio of persons at risk of poverty for each federal state of Austria, group = "region" can be used.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, group = "region")
povertyRates$Estimates

year	n	N	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549	260564	Burgenland	direct	19.53984	3.8201446
2010	733	377355	Vorarlberg	direct	16.53731	3.4361247
2010	924	535451	Salzburg	direct	13.78734	1.8458914
2010	1078	563648	Carinthia	direct	13.08627	2.0096038
2010	1317	701899	Tyrol	direct	15.30819	1.8293976
2010	2295	1167045	Styria	direct	14.37464	1.0559605
2010	2322	1598931	Vienna	direct	17.23468	1.1871171
2010	2804	1555709	Lower Austria	direct	13.84362	1.1256995
2010	2805	1421620	Upper Austria	direct	10.88977	0.9377872
2010	14827	8182222	NA	direct	14.44422	0.3755538

The last row with region = NA denotes the aggregate over all regions. Note that the columns N and n now show the weighted and unweighted number of persons in each region.

Several grouping variables

In case more than one grouping variable is used, there are several options of calling calc.stError() depending on whether combinations of grouping levels should be regarded or not. We will consider the variables gender and region as our grouping variables and show three options on how calc.stError() can be called.

Option 1: All regions and all genders

Calculate the point estimate and standard error for each region and each gender. The number of rows in the output is therefore

\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12.\]

The last row is again the estimate for the whole period.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549	260564	NA	Burgenland	direct	19.53984	3.8201446
2010	733	377355	NA	Vorarlberg	direct	16.53731	3.4361247
2010	924	535451	NA	Salzburg	direct	13.78734	1.8458914
2010	1078	563648	NA	Carinthia	direct	13.08627	2.0096038
2010	1317	701899	NA	Tyrol	direct	15.30819	1.8293976
2010	2295	1167045	NA	Styria	direct	14.37464	1.0559605
2010	2322	1598931	NA	Vienna	direct	17.23468	1.1871171
2010	2804	1555709	NA	Lower Austria	direct	13.84362	1.1256995
2010	2805	1421620	NA	Upper Austria	direct	10.88977	0.9377872
2010	7267	3979572	male	NA	direct	12.02660	0.3524528
2010	7560	4202650	female	NA	direct	16.73351	0.4706546
2010	14827	8182222	NA	NA	direct	14.44422	0.3755538

Option 2: All combinations of `region` and `gender`

Split the data by all combinations of the two grouping variables. This will result in a larger output-table of the size

\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + 1) = 1\cdot(9\cdot2 + 1)= 19.\]

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	261	122741.8	male	Burgenland	direct	17.414524	3.7608324
2010	288	137822.2	female	Burgenland	direct	21.432598	4.1870089
2010	359	182732.9	male	Vorarlberg	direct	12.973259	3.7597884
2010	374	194622.1	female	Vorarlberg	direct	19.883637	3.8858420
2010	440	253143.7	male	Salzburg	direct	9.156964	1.9057526
2010	484	282307.3	female	Salzburg	direct	17.939382	2.1700685
2010	517	268581.4	male	Carinthia	direct	10.552148	1.6181769
2010	561	295066.6	female	Carinthia	direct	15.392924	2.6174006
2010	650	339566.5	male	Tyrol	direct	12.857542	2.2710099
2010	667	362332.5	female	Tyrol	direct	17.604861	1.6905978
2010	1128	571011.7	male	Styria	direct	11.671247	1.2206931
2010	1132	774405.4	male	Vienna	direct	15.590616	1.0566260
2010	1167	596033.3	female	Styria	direct	16.964539	1.2797304
2010	1190	824525.6	female	Vienna	direct	18.778813	1.4814944
2010	1363	684272.5	male	Upper Austria	direct	9.074690	0.9212612
2010	1387	772593.2	female	Lower Austria	direct	16.372949	1.2166483
2010	1417	783115.8	male	Lower Austria	direct	11.348283	1.1851814
2010	1442	737347.5	female	Upper Austria	direct	12.574205	1.1522839
2010	14827	8182222.0	NA	NA	direct	14.444218	0.3755538

Option 3: Cobination of Option 1 and Option 2

In this case, the estimates and standard errors are calculated for

every gender,
every region and
every combination of region and gender.

The number of rows in the output is therefore

\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9\cdot2 + 9 + 2 + 1) = 30.\]

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list("gender", "region", c("gender", "region")))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	261	122741.8	male	Burgenland	direct	17.414524	3.7608324
2010	288	137822.2	female	Burgenland	direct	21.432598	4.1870089
2010	359	182732.9	male	Vorarlberg	direct	12.973259	3.7597884
2010	374	194622.1	female	Vorarlberg	direct	19.883637	3.8858420
2010	440	253143.7	male	Salzburg	direct	9.156964	1.9057526
2010	484	282307.3	female	Salzburg	direct	17.939382	2.1700685
2010	517	268581.4	male	Carinthia	direct	10.552148	1.6181769
2010	549	260564.0	NA	Burgenland	direct	19.539836	3.8201446
2010	561	295066.6	female	Carinthia	direct	15.392924	2.6174006
2010	650	339566.5	male	Tyrol	direct	12.857542	2.2710099
2010	667	362332.5	female	Tyrol	direct	17.604861	1.6905978
2010	733	377355.0	NA	Vorarlberg	direct	16.537310	3.4361247
2010	924	535451.0	NA	Salzburg	direct	13.787343	1.8458914
2010	1078	563648.0	NA	Carinthia	direct	13.086268	2.0096038
2010	1128	571011.7	male	Styria	direct	11.671247	1.2206931
2010	1132	774405.4	male	Vienna	direct	15.590616	1.0566260
2010	1167	596033.3	female	Styria	direct	16.964539	1.2797304
2010	1190	824525.6	female	Vienna	direct	18.778813	1.4814944
2010	1317	701899.0	NA	Tyrol	direct	15.308191	1.8293976
2010	1363	684272.5	male	Upper Austria	direct	9.074690	0.9212612
2010	1387	772593.2	female	Lower Austria	direct	16.372949	1.2166483
2010	1417	783115.8	male	Lower Austria	direct	11.348283	1.1851814
2010	1442	737347.5	female	Upper Austria	direct	12.574205	1.1522839
2010	2295	1167045.0	NA	Styria	direct	14.374637	1.0559605
2010	2322	1598931.0	NA	Vienna	direct	17.234683	1.1871171
2010	2804	1555709.0	NA	Lower Austria	direct	13.843623	1.1256995
2010	2805	1421620.0	NA	Upper Austria	direct	10.889773	0.9377872
2010	7267	3979571.7	male	NA	direct	12.026600	0.3524528
2010	7560	4202650.3	female	NA	direct	16.733508	0.4706546
2010	14827	8182222.0	NA	NA	direct	14.444218	0.3755538

Group differences

If differences between groups need to be calculated, e.g difference of poverty rates between gender = "male" and gender = "female", parameter group.diff can be utilised. Setting group.diff = TRUE the differences and the standard error of these differences for all variables defined in groups will be calculated.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"),
                             group.diff = TRUE)
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549.0	260564.0	NA	Burgenland	direct	19.5398365	3.8201446
2010	641.0	318959.5	NA	Burgenland - Vorarlberg	group difference	3.0025263	4.3512326
2010	733.0	377355.0	NA	Vorarlberg	direct	16.5373102	3.4361247
2010	736.5	398007.5	NA	Burgenland - Salzburg	group difference	5.7524933	4.0727197
2010	813.5	412106.0	NA	Burgenland - Carinthia	group difference	6.4535688	4.2042780
2010	828.5	456403.0	NA	Salzburg - Vorarlberg	group difference	-2.7499670	3.5203199
2010	905.5	470501.5	NA	Carinthia - Vorarlberg	group difference	-3.4510424	3.7663942
2010	924.0	535451.0	NA	Salzburg	direct	13.7873432	1.8458914
2010	933.0	481231.5	NA	Burgenland - Tyrol	group difference	4.2316460	4.2130768
2010	1001.0	549549.5	NA	Carinthia - Salzburg	group difference	-0.7010755	2.6242885
2010	1025.0	539627.0	NA	Tyrol - Vorarlberg	group difference	-1.2291197	4.6030925
2010	1078.0	563648.0	NA	Carinthia	direct	13.0862677	2.0096038
2010	1120.5	618675.0	NA	Salzburg - Tyrol	group difference	-1.5208473	2.3646678
2010	1197.5	632773.5	NA	Carinthia - Tyrol	group difference	-2.2219227	3.0161373
2010	1317.0	701899.0	NA	Tyrol	direct	15.3081905	1.8293976
2010	1422.0	713804.5	NA	Burgenland - Styria	group difference	5.1651992	4.6852127
2010	1435.5	929747.5	NA	Burgenland - Vienna	group difference	2.3051533	3.8296985
2010	1514.0	772200.0	NA	Styria - Vorarlberg	group difference	-2.1626729	4.1234841
2010	1527.5	988143.0	NA	Vienna - Vorarlberg	group difference	0.6973730	3.8746662
2010	1609.5	851248.0	NA	Salzburg - Styria	group difference	-0.5872941	2.1351288
2010	1623.0	1067191.0	NA	Salzburg - Vienna	group difference	-3.4473400	1.9229290
2010	1676.5	908136.5	NA	Burgenland - Lower Austria	group difference	5.6962137	4.4195868
2010	1677.0	841092.0	NA	Burgenland - Upper Austria	group difference	8.6500631	3.8010158
2010	1686.5	865346.5	NA	Carinthia - Styria	group difference	-1.2883695	2.2730489
2010	1700.0	1081289.5	NA	Carinthia - Vienna	group difference	-4.1484155	2.5062449
2010	1768.5	966532.0	NA	Lower Austria - Vorarlberg	group difference	-2.6936874	3.4123823
2010	1769.0	899487.5	NA	Upper Austria - Vorarlberg	group difference	-5.6475368	3.5482126
2010	1806.0	934472.0	NA	Styria - Tyrol	group difference	-0.9335532	1.8749174
2010	1819.5	1150415.0	NA	Tyrol - Vienna	group difference	-1.9264927	1.7045409
2010	1864.0	1045580.0	NA	Lower Austria - Salzburg	group difference	0.0562796	1.8470515
2010	1864.5	978535.5	NA	Salzburg - Upper Austria	group difference	2.8975698	2.2161464
2010	1941.0	1059678.5	NA	Carinthia - Lower Austria	group difference	-0.7573551	2.0484059
2010	1941.5	992634.0	NA	Carinthia - Upper Austria	group difference	2.1964944	2.4497219
2010	2060.5	1128804.0	NA	Lower Austria - Tyrol	group difference	-1.4645677	2.5759089
2010	2061.0	1061759.5	NA	Tyrol - Upper Austria	group difference	4.4184171	1.6771042
2010	2295.0	1167045.0	NA	Styria	direct	14.3746373	1.0559605
2010	2308.5	1382988.0	NA	Styria - Vienna	group difference	-2.8600459	1.6846330
2010	2322.0	1598931.0	NA	Vienna	direct	17.2346832	1.1871171
2010	2549.5	1361377.0	NA	Lower Austria - Styria	group difference	-0.5310145	1.2602200
2010	2550.0	1294332.5	NA	Styria - Upper Austria	group difference	3.4848639	1.6332932
2010	2563.0	1577320.0	NA	Lower Austria - Vienna	group difference	-3.3910604	1.9532508
2010	2563.5	1510275.5	NA	Upper Austria - Vienna	group difference	-6.3449098	1.4178877
2010	2804.0	1555709.0	NA	Lower Austria	direct	13.8436228	1.1256995
2010	2804.5	1488664.5	NA	Lower Austria - Upper Austria	group difference	2.9538494	1.9538722
2010	2805.0	1421620.0	NA	Upper Austria	direct	10.8897734	0.9377872
2010	7267.0	3979571.7	male	NA	direct	12.0266000	0.3524528
2010	7413.5	4091111.0	male - female	NA	group difference	-4.7069081	0.3539490
2010	7560.0	4202650.3	female	NA	direct	16.7335081	0.4706546
2010	14827.0	8182222.0	NA	NA	direct	14.4442182	0.3755538

The resulting output table contains 49 rows. 12 rows for all the direct estimators

\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12,\]

and another 37 for all the differences within the variable "gender" and "region" seperately. Variable "gender" has 2 unique values (unique(dat2$gender)) resulting in 1 difference, ~ gender = "male" - gender = "female" and variable "region" has 9 unique values (unique(dat2$region)) resulting in

\[8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = \sum\limits_{1=1}^{9-1}i = 36\]

estimates. Thus the output contains 1 + 36 = 37 estimates with respect to group differences.

If a combintaion of grouping variables is used in group and group.diff = TRUE then differences between combinations will only be calculated if one of the grouping variables differs. For example the difference between the following groups would be calculated

gender = "female" & region = "Vienna" - gender = "male" & region = "Vienna"
gender = "female" & region = "Vienna" - gender = "female" & region = "Salzburg"
gender = "male" & region = "Salzburg" - gender = "female" & region = "Salzburg"

The difference between gender = "female" & region = "Vienna" and gender = "male" & region = "Salzburg" however would not be calculated.

Thus this leads to

\[2\cdot(\sum\limits_{1=1}^{9-1}i) + 9\cdot1 = 81\]

results with respect to the differences. The Output contains an additional column estimate_type and

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")),
                             group.diff = TRUE)
povertyRates$Estimates[,.N,by=.(estimate_type)]

estimate_type	N
direct	19
group difference	81

Differences between survey periods

Differences of estimates between periods can be calculated using parameter period.diff. period.diff expects a character vector (if not NULL) specifying for which periods the differences should be calcualed for. The inputs should be specified in the form "period2" - "period1".

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2014	14827	8182222	direct	15.1455601	0.4954098
2015	14827	8182222	direct	15.5364014	0.5456595
2015-2014	14827	8182222	period difference	0.3908413	0.3505833
2016	14827	8182222	direct	15.0831502	0.5211549
2016-2015	14827	8182222	period difference	-0.4532512	0.3818030
2017	14827	8182222	direct	15.4201916	0.3757101
2017-2016	14827	8182222	period difference	0.3370414	0.4140711

If additional grouping variables are supplied to calc.stError() die differences across periods are also carried out for all variables in group.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             group = "gender",
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2014	7267	3979572	male	direct	14.5035068	0.6086880
2014	7560	4202650	female	direct	15.7535328	0.4769638
2014	14827	8182222	NA	direct	15.1455601	0.4954098
2015	7267	3979572	male	direct	15.1228904	0.6391846
2015	7560	4202650	female	direct	15.9279630	0.5250614
2015	14827	8182222	NA	direct	15.5364014	0.5456595
2015-2014	7267	3979572	male	period difference	0.6193836	0.3561700
2015-2014	7560	4202650	female	period difference	0.1744301	0.3658346
2015-2014	14827	8182222	NA	period difference	0.3908413	0.3505833
2016	7267	3979572	male	direct	14.5796824	0.5975064
2016	7560	4202650	female	direct	15.5598937	0.5005551
2016	14827	8182222	NA	direct	15.0831502	0.5211549
2016-2015	7267	3979572	male	period difference	-0.5432080	0.3532349
2016-2015	7560	4202650	female	period difference	-0.3680693	0.4613847
2016-2015	14827	8182222	NA	period difference	-0.4532512	0.3818030
2017	7267	3979572	male	direct	14.9481591	0.3562568
2017	7560	4202650	female	direct	15.8671684	0.4535615
2017	14827	8182222	NA	direct	15.4201916	0.3757101
2017-2016	7267	3979572	male	period difference	0.3684767	0.4827706
2017-2016	7560	4202650	female	period difference	0.3072748	0.4544777
2017-2016	14827	8182222	NA	period difference	0.3370414	0.4140711

Averages across periods

With parameter period.mean averages across periods are calculated additional. The parameter accepts only odd integer values. The resulting table will contain the direct estimates as well as rolling averages of length period.mean.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3)
povertyRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2014	14827	8182222	direct	15.14556	0.4954098
2014_2015_2016	14827	8182222	period average	15.25504	0.4615078
2015	14827	8182222	direct	15.53640	0.5456595
2015_2016_2017	14827	8182222	period average	15.34658	0.4211127
2016	14827	8182222	direct	15.08315	0.5211549
2017	14827	8182222	direct	15.42019	0.3757101

if in addition the parameters group and/or period.diff are specified then differences and groupings of averages will be calculated.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3, period.diff = "2016 - 2015",
                             group = "gender")
povertyRates$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2014	7267	3979572	male	direct	14.5035068	0.6086880
2014	7560	4202650	female	direct	15.7535328	0.4769638
2014	14827	8182222	NA	direct	15.1455601	0.4954098
2014_2015_2016	7267	3979572	male	period average	14.7353599	0.5710507
2014_2015_2016	7560	4202650	female	period average	15.7471298	0.4185309
2014_2015_2016	14827	8182222	NA	period average	15.2550372	0.4615078
2015	7267	3979572	male	direct	15.1228904	0.6391846
2015	7560	4202650	female	direct	15.9279630	0.5250614
2015	14827	8182222	NA	direct	15.5364014	0.5456595
2015_2016_2017	7267	3979572	male	period average	14.8835773	0.4784864
2015_2016_2017	7560	4202650	female	period average	15.7850084	0.3963737
2015_2016_2017	14827	8182222	NA	period average	15.3465811	0.4211127
2016	7267	3979572	male	direct	14.5796824	0.5975064
2016	7560	4202650	female	direct	15.5598937	0.5005551
2016	14827	8182222	NA	direct	15.0831502	0.5211549
2016-2015	7267	3979572	male	period difference	-0.5432080	0.3532349
2016-2015	7560	4202650	female	period difference	-0.3680693	0.4613847
2016-2015	14827	8182222	NA	period difference	-0.4532512	0.3818030
2016-2015_mean	7267	3979572	male	difference between period averages	0.1482174	0.1669658
2016-2015_mean	7560	4202650	female	difference between period averages	0.0378785	0.2406335
2016-2015_mean	14827	8182222	NA	difference between period averages	0.0915438	0.1818219
2017	7267	3979572	male	direct	14.9481591	0.3562568
2017	7560	4202650	female	direct	15.8671684	0.4535615
2017	14827	8182222	NA	direct	15.4201916	0.3757101