Outlier Detection

First, we use one of the algorithms to detect outliers in our sample. Suppose we use Robustified 2SLS, iterate the algorithm five times, and choose a significance level \(\gamma_{c}\) of 0.05 (cut-off 1.96 under normality) to classify an observation as an outlier or not.

Running the algorithm can be done as follows. The returned object is an object of class robust2sls. This is a list storing, inter alia, the model estimates and classifications as outliers for each iteration.

model <- outlier_detection(data = d, formula = p$setting$formula, ref_dist = "normal",
                           sign_level = 0.05, initial_est = "robustified", iterations = 5)
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
print(model)
#> Outlier-Robust 2SLS Model 
#> Initial estimator:  robustified 
#> Reference distribution:  normal 
#> Two-stage Least-Squares Model: y ~ x1 + x2 + x3 + x4 + x5 | x1 + x2 + x3 + z4 + z5 + z6 
#> Iterations:  5 
#> Final selection:  Outliers found:  29     Outliers proportion:  0.029

For some of the tests, we might want to use the same algorithm but different cut-off values. Under the null hypothesis of no outliers, we expect a proportional relationship between \(\gamma_{c}\) and the share of detected outliers. For example, for \(\gamma_{c} = 0.01\), we expect to classify 1% of the observations as outliers; for \(\gamma_{c} = 0.05\), we expect to classify 5% as outliers. The utility function multi_cutoff() allows the user to estimate several such models by giving the function a vector of multiple \(\gamma\) values.

The function can be executed in parallel by registering a parallel back-end. We choose to run the expression sequentially because it is already fast. The function simply returns a list of robust2sls objects.

# choose which gamma values to use
gammas <- seq(0.01, 0.05, 0.01)
# register backend
library(doFuture)
registerDoFuture()
plan(sequential)
models <- multi_cutoff(gamma = gammas, data = d, formula = p$setting$formula, ref_dist = "normal",
                       initial_est = "robustified", iterations = 5)
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
#> Warning in predict.ivreg(model, data): aliased coefficients in model, dropped
length(models)
#> [1] 5

Outlier Testing

This section introduces four different tests:

proportion test
count test
scaling sum test
scaling supremum test

Each subsection quickly explains the intuition, the formula for the test statistic, and provides a small example how to implement it.

As usual, all tests require a significance level to decide whether to reject or not. Note that there are two significance levels at play: We use \(\gamma_{c}\) to refer to the significance level that determines the cut-off value beyond which an error is classified as an outlier. It is a tuning parameter in the outlier_detection() algorithm. In contrast, \(\alpha\) refers to the significance level of the tests, such that we reject the null hypothesis when the p-value is smaller than the significance level.

Proportion Test

The idea of the proportion test is to check whether the detected share of outliers deviates significantly from the expected share of outliers under the null hypothesis of no outliers. The test statistic is a simple t-test based on the asymptotic distribution of the false outlier detection rate (FODR) and is hence given as \[t = \frac{\hat\gamma_{c} - \gamma_{c}}{se},\] where \(se\) is the standard error.

The function proptest() implements the test. It can either be given a robust2sls object or a list thereof, as returned by multi_cutoff().

# using a single robust2sls object
proptest(model, alpha = 0.05, iteration = 5, one_sided = FALSE)
#>   iter_test iter_act gamma        t      type       pval alpha reject
#> 1         5        5  0.05 -1.95939 two-sided 0.05006713  0.05  FALSE

# using a list of robust2sls objects
proptest(models, alpha = 0.05, iteration = 5, one_sided = TRUE)
#>           iter_test iter_act gamma         t      type      pval alpha reject
#> gamma0.01         5        5  0.01 -1.075118 one-sided 0.8588391  0.05  FALSE
#> gamma0.02         5        5  0.02 -1.220803 one-sided 0.8889197  0.05  FALSE
#> gamma0.03         5        5  0.03 -1.199388 one-sided 0.8848115  0.05  FALSE
#> gamma0.04         5        5  0.04 -1.966926 one-sided 0.9754041  0.05  FALSE
#> gamma0.05         5        5  0.05 -1.959390 one-sided 0.9749664  0.05  FALSE

The function returns a data frame and each row corresponds to one setting of \(\gamma_{c}\). Note that the value of the test statistic is the same for the first call of proptest() and the last row of the second call. They refer to the same model.

The first column of the output stores the iteration that was tested and the second column refers to the actual iteration that was tested. This only differs when the user tests the convergenct distribution, which may differ across different settings of \(\gamma_{c}\). The one-sided test only rejects for positive deviations from the expected value while the two-sided test rejects in both directions.

When testing several settings of the algorithms, we have a multiple testing issue. We can apply the Simes (1986) (DOI: 10.1093/biomet/73.3.751) procedure to fix the significance level for the global null hypothesis, which is rejected if any of the individual null hypotheses is rejected at the adjusted significance level.

proptests <- proptest(models, alpha = 0.05, iteration = 5, one_sided = TRUE)
a <- globaltest(tests = proptests, global_alpha = 0.05)

# decision for global hypothesis test
a$reject
#> [1] FALSE

# details for the Simes procedure
a$tests[, c("iter_test", "iter_act", "gamma", "t", "pval", "alpha_adj", "reject_adj")]
#>           iter_test iter_act gamma         t      pval alpha_adj reject_adj
#> gamma0.01         5        5  0.01 -1.075118 0.8588391      0.01      FALSE
#> gamma0.02         5        5  0.02 -1.220803 0.8889197      0.03      FALSE
#> gamma0.03         5        5  0.03 -1.199388 0.8848115      0.02      FALSE
#> gamma0.04         5        5  0.04 -1.966926 0.9754041      0.05      FALSE
#> gamma0.05         5        5  0.05 -1.959390 0.9749664      0.04      FALSE

Count Test

The idea is similar to the proportion test. Instead of comparing the share of detected outliers to its expected value, we now compare the expected number of detected outliers to the expected number. The test statistic asymptotically follows a Poisson distribution. \[n \hat \gamma_{c},\] where \(n\) is the sample size.

The function counttest() implements the test. As porptest(), it can either take a robust2sls object or a list thereof.

# using a single robust2sls object
counttest(model, alpha = 0.05, iteration = 5, one_sided = FALSE)
#>   iter_test iter_act gamma num_act num_exp      type        pval alpha reject
#> 1         5        5  0.05      29      50 two-sided 0.001833658  0.05   TRUE
#>   tsmethod
#> 1  central

# using a list of robust2sls objects
counttest(models, alpha = 0.05, iteration = 5, one_sided = TRUE)
#>           iter_test iter_act gamma num_act num_exp      type      pval alpha
#> gamma0.01         5        5  0.01       6      10 one-sided 0.9329140  0.05
#> gamma0.02         5        5  0.02      13      20 one-sided 0.9609880  0.05
#> gamma0.03         5        5  0.03      21      30 one-sided 0.9647154  0.05
#> gamma0.04         5        5  0.04      22      40 one-sided 0.9992660  0.05
#> gamma0.05         5        5  0.05      29      50 one-sided 0.9994895  0.05
#>           reject tsmethod
#> gamma0.01  FALSE     <NA>
#> gamma0.02  FALSE     <NA>
#> gamma0.03  FALSE     <NA>
#> gamma0.04  FALSE     <NA>
#> gamma0.05  FALSE     <NA>

As before, we can use the Simes (1986) (DOI: 10.1093/biomet/73.3.751) procedure to account for multiple hypothesis testing.

counttests <- counttest(models, alpha = 0.05, iteration = 5, one_sided = TRUE)
b <- globaltest(tests = counttests, global_alpha = 0.05)

# decision for global hypothesis test
b$reject
#> [1] FALSE

# details for the Simes procedure
b$tests[, c("iter_test", "iter_act", "gamma", "num_act", "num_exp", "pval", "alpha_adj", "reject_adj")]
#>           iter_test iter_act gamma num_act num_exp      pval alpha_adj
#> gamma0.01         5        5  0.01       6      10 0.9329140      0.01
#> gamma0.02         5        5  0.02      13      20 0.9609880      0.02
#> gamma0.03         5        5  0.03      21      30 0.9647154      0.03
#> gamma0.04         5        5  0.04      22      40 0.9992660      0.04
#> gamma0.05         5        5  0.05      29      50 0.9994895      0.05
#>           reject_adj
#> gamma0.01      FALSE
#> gamma0.02      FALSE
#> gamma0.03      FALSE
#> gamma0.04      FALSE
#> gamma0.05      FALSE

Scaling Sum Test

This test directly exploits the information we get from running several configurations of the outlier detection algorithm with varying \(\gamma_{c}\). We therefore need a list of robust2sls objects, as for example returned by multi_cutoff().

The test statistic is constructed by summing up the deviations across different cut-offs / values of \(\gamma_{c}\): \[t = \sum_{k = 1}^{K} \sqrt n (\hat \gamma_{c_{k}} - \gamma_{c_{k}})/se,\] where \(K\) is the number of different cut-off values that we have tried and \(se\) is the standard error of this sum.

c <- sumtest(models, alpha = 0.05, iteration = 1, one_sided = FALSE)

attr(c, "gammas")
#> [1] 0.01 0.02 0.03 0.04 0.05

Note that we tested iteration 1 instead of iteration 5, which was the number of iterations we applied the algorithm. In general, as long as all robust2sls model objects contain that iteration, it can be tested - even if the model objects themselves contain further iterations.

The returned data frame has an attribute called "gammas", which stores the \(\gamma_{c}\) values that were used in constructing the test.

Scaling Supremum Test

As in the previous test, we combine several deviations across different cut-offs / values of \(\gamma_{c}\). In this case, we take the supremum / maximum, since we have a finite number of \(\gamma\) values: \[t = \sup_{k = 1,...,K} |\sqrt n (\hat \gamma_{c_{k}} - \gamma_{c_{k}})|\] We simulate the asymptotic distribution of this object to derive the critical values and the p-value.

The function suptest() implements this test. The user can specify which critical values should be returned. The default is to report the values corresponding to the 90th, 95th, and 99th quantile.

d <- suptest(models, alpha = 0.05, iteration = 5)

attr(d, "gammas")
#> [1] 0.01 0.02 0.03 0.04 0.05
attr(d, "critical")
#>       90%       95%       99% 
#> 0.5894801 0.6869229 0.8879335

As for sumtest(), the returned data frame has an attribute called "gammas", which stores the \(\gamma_{c}\) values that were used in constructing the test. In addition, there is an attribute "critical", which stores the critical values of the simulated asymptotic distribution against which the test statistic was compared.

Outlier Testing

Jonas Kurle

11 June 2022

Introduction

Artificial Data

Outlier Detection

Outlier Testing

Proportion Test

Count Test

Scaling Sum Test

Scaling Supremum Test