Stochastic Search Variable Selection in bvartools

Inference based on a user-written algorithm

The prior variances of the parameters are set in accordance with the semiautomatic approach described in George et al. (2008). Hence, the prior variance of the \(i\)th parameter is set to \(\tau_{1,i}^2 = (10 \hat{\sigma}_i)^2\) if this parameter should be included in the model and to \(\tau_{0,i}^2 = (0.1 \hat{\sigma}_i)^2\) if it should be excluded. \(\hat{\sigma}_i\) is the standard error associated with the unconstrained least squares estimate of parameter \(i\). For all variables the prior inclusion probabilities are set to 0.5. The necessary calculations can be done with function ssvs_prior. The prior of the error variance-covariance matrix is uninformative and, in constrast to George et al. (2008), SSVS is not applied to the covariances.

# Reset random number generator for reproducibility
set.seed(1234567)

# Get data matrices
y <- t(data$data$Y)
x <- t(data$data$Z)

tt <- ncol(y) # Number of observations
k <- nrow(y) # Number of endogenous variables
m <- k * nrow(x) # Number of estimated coefficients

# Coefficient priors
a_mu_prior <- matrix(0, m) # Vector of prior means

# SSVS priors (semiautomatic approach)
vs_prior <- ssvs_prior(data, semiautomatic = c(.1, 10))
tau0 <- vs_prior$tau0
tau1 <- vs_prior$tau1

# Prior for inclusion parameter
prob_prior <- matrix(0.5, m)

# Prior for variance-covariance matrix
u_sigma_df_prior <- 0 # Prior degrees of freedom
u_sigma_scale_prior <- diag(0.00001, k) # Prior covariance matrix
u_sigma_df_post <- tt + u_sigma_df_prior # Posterior degrees of freedom

The initial parameter values are set to zero and their corresponding prior variances are set to \(\tau_1^2\), which implies that all parameters should be estimated relatively freely in the first step of the Gibbs sampler.

# Initial values
a <- matrix(0, m)
a_v_i_prior <- diag(1 / c(tau1)^2, m) # Inverse of the prior covariance matrix

# Data containers for posterior draws
iterations <- 10000 # Number of total Gibs sampler draws
burnin <- 5000 # Number of burn-in draws
draws <- iterations + burnin # Total number of draws

draws_a <- matrix(NA, m, iterations)
draws_lambda <- matrix(NA, m, iterations)
draws_sigma <- matrix(NA, k^2, iterations)

SSVS can be added to a standard Gibbs sampler algorithm for VAR models in a straightforward manner. The ssvs function can be used to obtain a draw of inclusion parameters and its corresponding inverted prior variance matrix. It requires the current draw of parameters, standard errors \(\tau_0\) and \(\tau_1\), and prior inclusion probabilities as arguments. In this example constant terms are excluded from SSVS, which is achieved by specifying include = 1:36. Hence, only parameters 1 to 36 are considered by the function and the remaining three parameters have prior variances that correspond to their values in \(\tau_1^2\).

# Start Gibbs sampler
for (draw in 1:draws) {
  # Draw variance-covariance matrix
  u <- y - matrix(a, k) %*% x # Obtain residuals
  # Scale posterior
  u_sigma_scale_post <- solve(u_sigma_scale_prior + tcrossprod(u))
  # Draw posterior of inverse sigma
  u_sigma_i <- matrix(rWishart(1, u_sigma_df_post, u_sigma_scale_post)[,, 1], k)
  # Obtain sigma
  u_sigma <- solve(u_sigma_i)
  
  # Draw conditional mean parameters
  a <- post_normal(y, x, u_sigma_i, a_mu_prior, a_v_i_prior)
  
  # Draw inclusion parameters and update priors
  temp <- ssvs(a, tau0, tau1, prob_prior, include = 1:36)
  a_v_i_prior <- temp$v_i # Update prior
  
  # Store draws
  if (draw > burnin) {
    draws_a[, draw - burnin] <- a
    draws_lambda[, draw - burnin] <- temp$lambda
    draws_sigma[, draw - burnin] <- u_sigma
  }
}

The output of a Gibbs sampler with SSVS can be further analysed in the usual way. With the bvartools package the posterior draws can be collected in a bvar object and the summary method provides summary statistics. It is also possible to add information on the inclusion parameters to the bvar object by providing a named list as an argument. The list must contain an element coeffs, which contains the MCMC draws of the coefficients, and element lambda contains the corresponding draw of the inclusion parameter.

bvar_est <- bvar(y = data$data$Y, x = data$data$Z,
                 A = list(coeffs = draws_a[1:36,],
                          lambda = draws_lambda[1:36,]),
                 C = list(coeffs = draws_a[37:39, ],
                          lambda = draws_lambda[37:39,]),
                 Sigma = draws_sigma)

bvar_summary <- summary(bvar_est)

bvar_summary
#> 
#> Bayesian VAR model with p = 4 
#> 
#> Model:
#> 
#> y ~ invest.01 + income.01 + cons.01 + invest.02 + income.02 + cons.02 + invest.03 + income.03 + cons.03 + invest.04 + income.04 + cons.04 + const
#> 
#> Variable: invest 
#> 
#>                Mean      SD  Naive SD Time-series SD      2.5%       50%
#> invest.01 -0.102678 0.13826 0.0013826      0.0067122 -0.411837 -0.016493
#> income.01  0.030302 0.17083 0.0017083      0.0034326 -0.124401  0.008322
#> cons.01    0.095132 0.32165 0.0032165      0.0100857 -0.156626  0.019395
#> invest.02 -0.012007 0.05070 0.0005070      0.0012843 -0.187390 -0.002507
#> income.02  0.006950 0.14724 0.0014724      0.0017426 -0.156169  0.002443
#> cons.02    0.021886 0.17860 0.0017860      0.0032578 -0.182990  0.007841
#> invest.03  0.032122 0.07987 0.0007987      0.0025971 -0.026218  0.005942
#> income.03 -0.007616 0.15335 0.0015335      0.0020519 -0.296980 -0.001926
#> cons.03   -0.047414 0.21576 0.0021576      0.0044539 -0.724049 -0.013501
#> invest.04  0.264164 0.15632 0.0015632      0.0098066 -0.009652  0.290565
#> income.04 -0.055766 0.21719 0.0021719      0.0049718 -0.802918 -0.014542
#> cons.04   -0.021885 0.18660 0.0018660      0.0031490 -0.515568 -0.006293
#> const      0.013325 0.01208 0.0001208      0.0002352 -0.012475  0.013477
#>             97.5% Incl. prob.
#> invest.01 0.02130      0.4262
#> income.01 0.57057      0.0797
#> cons.01   1.20857      0.1305
#> invest.02 0.03014      0.1068
#> income.02 0.27650      0.0701
#> cons.02   0.40729      0.0623
#> invest.03 0.28849      0.1870
#> income.03 0.19233      0.0763
#> cons.03   0.15528      0.0911
#> invest.04 0.53103      0.8281
#> income.04 0.12295      0.1135
#> cons.04   0.16283      0.0797
#> const     0.03766      1.0000
#> 
#> Variable: income 
#> 
#>                 Mean       SD  Naive SD Time-series SD      2.5%        50%
#> invest.01  9.956e-03 0.021913 2.191e-04      7.948e-04 -0.006260  1.827e-03
#> income.01 -2.366e-02 0.085353 8.535e-04      4.298e-03 -0.313599 -3.453e-03
#> cons.01    1.308e-01 0.190136 1.901e-03      1.324e-02 -0.033440  2.252e-02
#> invest.02  1.851e-03 0.011114 1.111e-04      2.645e-04 -0.008256  4.361e-04
#> income.02  4.504e-03 0.041084 4.108e-04      6.930e-04 -0.037925  1.644e-03
#> cons.02   -6.459e-04 0.037967 3.797e-04      4.375e-04 -0.054470 -4.947e-04
#> invest.03 -6.077e-05 0.008284 8.284e-05      9.137e-05 -0.012292 -3.118e-05
#> income.03  1.780e-02 0.060741 6.074e-04      2.057e-03 -0.031838  4.959e-03
#> cons.03    6.922e-03 0.049219 4.922e-04      9.647e-04 -0.047575  2.479e-03
#> invest.04  1.925e-03 0.010591 1.059e-04      2.051e-04 -0.008012  4.585e-04
#> income.04 -1.114e-02 0.045967 4.597e-04      1.077e-03 -0.165655 -3.475e-03
#> cons.04    1.297e-03 0.034280 3.428e-04      3.310e-04 -0.043869  7.042e-04
#> const      1.742e-02 0.003922 3.922e-05      1.701e-04  0.008960  1.802e-02
#>             97.5% Incl. prob.
#> invest.01 0.07729      0.2167
#> income.01 0.03551      0.1404
#> cons.01   0.58572      0.3881
#> invest.02 0.03535      0.0842
#> income.02 0.09878      0.0715
#> cons.02   0.05278      0.0506
#> invest.03 0.01080      0.0706
#> income.03 0.23046      0.1069
#> cons.03   0.13588      0.0752
#> invest.04 0.03454      0.0965
#> income.04 0.03167      0.0934
#> cons.04   0.04732      0.0553
#> const     0.02394      1.0000
#> 
#> Variable: cons 
#> 
#>                 Mean       SD  Naive SD Time-series SD      2.5%        50%
#> invest.01 -0.0019158 0.009862 9.862e-05      2.629e-04 -0.033258 -4.062e-04
#> income.01  0.1557165 0.140919 1.409e-03      1.107e-02 -0.015750  1.699e-01
#> cons.01   -0.2704923 0.197753 1.978e-03      1.768e-02 -0.596806 -3.102e-01
#> invest.02  0.0055334 0.015130 1.513e-04      5.850e-04 -0.005640  1.104e-03
#> income.02  0.3056906 0.100301 1.003e-03      4.675e-03  0.013740  3.133e-01
#> cons.02    0.0098651 0.055799 5.580e-04      3.613e-03 -0.040072  2.129e-03
#> invest.03  0.0003352 0.006858 6.858e-05      8.552e-05 -0.007689  6.655e-05
#> income.03  0.0106024 0.043671 4.367e-04      1.472e-03 -0.026945  2.755e-03
#> cons.03    0.0211561 0.062100 6.210e-04      1.952e-03 -0.029379  5.506e-03
#> invest.04 -0.0040890 0.012678 1.268e-04      3.930e-04 -0.048091 -8.160e-04
#> income.04  0.0258709 0.061971 6.197e-04      2.026e-03 -0.022137  5.951e-03
#> cons.04   -0.0001850 0.030872 3.087e-04      4.057e-04 -0.046276  3.982e-05
#> const      0.0140811 0.003528 3.528e-05      1.058e-04  0.007062  1.407e-02
#>              97.5% Incl. prob.
#> invest.01 0.006557      0.1047
#> income.01 0.408675      0.6205
#> cons.01   0.018590      0.7312
#> invest.02 0.055340      0.1609
#> income.02 0.482837      0.9685
#> cons.02   0.187899      0.0879
#> invest.03 0.012455      0.0711
#> income.03 0.159233      0.1018
#> cons.03   0.230200      0.1362
#> invest.04 0.005295      0.1339
#> income.04 0.222811      0.1825
#> cons.04   0.040838      0.0689
#> const     0.021018      1.0000
#> 
#> Variance-covariance matrix:
#> 
#>                    Mean        SD  Naive SD Time-series SD       2.5%       50%
#> invest_invest 2.185e-03 3.999e-04 3.999e-06      6.265e-06  1.532e-03 2.142e-03
#> invest_income 4.850e-05 7.491e-05 7.491e-07      9.895e-07 -9.693e-05 4.608e-05
#> invest_cons   1.402e-04 6.155e-05 6.155e-07      7.129e-07  2.726e-05 1.363e-04
#> income_invest 4.850e-05 7.491e-05 7.491e-07      9.895e-07 -9.693e-05 4.608e-05
#> income_income 1.513e-04 2.695e-05 2.695e-07      2.843e-07  1.068e-04 1.484e-04
#> income_cons   6.660e-05 1.781e-05 1.781e-07      2.403e-07  3.610e-05 6.500e-05
#> cons_invest   1.402e-04 6.155e-05 6.155e-07      7.129e-07  2.726e-05 1.363e-04
#> cons_income   6.660e-05 1.781e-05 1.781e-07      2.403e-07  3.610e-05 6.500e-05
#> cons_cons     9.778e-05 1.799e-05 1.799e-07      3.400e-07  6.843e-05 9.601e-05
#>                   97.5%
#> invest_invest 0.0030856
#> invest_income 0.0002018
#> invest_cons   0.0002725
#> income_invest 0.0002018
#> income_income 0.0002132
#> income_cons   0.0001062
#> cons_invest   0.0002725
#> cons_income   0.0001062
#> cons_cons     0.0001381

The inclusion probabilities of the constant terms are 100 percent, because they were excluded from SSVS.

Using the results from above the researcher could proceed in the usual way and obtain forecasts and impulse responses based on the output of the Gibbs sampler. The advantage of this approach is that it does not only take into account parameter uncertainty, but also model uncertainty. This can be illustrated by the histogram of the posterior draws of the 6th coefficient, which describes the relationship between the first lag of income and the current value of consumption.

hist(draws_a[6,], main = "Consumption ~ First lag of income", xlab = "Value of posterior draw")

A non-negligible mass of some 23 percent, i.e. 1 - 0.67, of the parameter draws is concentrated around zero. This is the result of SSVS, where posterior draws are close to zero if a constant is assessed to be irrelevant during an iteration of the Gibbs sampler and, therefore, \(\tau_{0,6}^2\) is used as its prior variance. On the other hand, about 67 percent of the draws are dispersed around a positive value, where SSVS suggests to include the variable in the model and the larger value \(\tau_{1,6}^2\) is used as prior variance. Model uncertainty is then described by the two peaks and parameter uncertainty by the dispersion of the posterior draws around them.

However, if the researcher prefers not to work with a model, where the relevance of a variable can change from one step of the sampling algorithm to the next, a different approach would be to work only with a highly probable model. This can be done with a further simulation, where very tight priors are used for irrelevant variables and relatively uninformative priors for relevant parameters. In this example, coefficients with a posterior inclusion probability of above 40 percent are considered to be relevant.² The prior variance is set to 0.00001 for irrelevant and to 9 for relevant variables. No additional SSVS step is required. Everything else remains unchanged.

# Get inclusion probabilities
lambda <- bvar_summary$coefficients$lambda

# Select variables that should be included
include_var <- c(lambda >= .4)

# Update prior variances
diag(a_v_i_prior)[!include_var] <- 1 / 0.00001 # Very tight prior close to zero
diag(a_v_i_prior)[include_var] <- 1 / 9 # Relatively uninformative prior

# Data containers for posterior draws
draws_a <- matrix(NA, m, iterations)
draws_sigma <- matrix(NA, k^2, iterations)

# Start Gibbs sampler
for (draw in 1:draws) {
  # Draw conditional mean parameters
  a <- post_normal(y, x, u_sigma_i, a_mu_prior, a_v_i_prior)
  
  # Draw variance-covariance matrix
  u <- y - matrix(a, k) %*% x # Obtain residuals
  u_sigma_scale_post <- solve(u_sigma_scale_prior + tcrossprod(u))
  u_sigma_i <- matrix(rWishart(1, u_sigma_df_post, u_sigma_scale_post)[,, 1], k)
  u_sigma <- solve(u_sigma_i) # Invert Sigma_i to obtain Sigma
  
  # Store draws
  if (draw > burnin) {
    draws_a[, draw - burnin] <- a
    draws_sigma[, draw - burnin] <- u_sigma
  }
}

The means of the posterior draws are similar to the OLS estimates in Lütkepohl (2006, Section 5.2.10):

bvar_est <- bvar(y = data$data$Y, x = data$data$Z, A = draws_a[1:36,],
                 C = draws_a[37:39, ], Sigma = draws_sigma)

summary(bvar_est)
#> 
#> Bayesian VAR model with p = 4 
#> 
#> Model:
#> 
#> y ~ invest.01 + income.01 + cons.01 + invest.02 + income.02 + cons.02 + invest.03 + income.03 + cons.03 + invest.04 + income.04 + cons.04 + const
#> 
#> Variable: invest 
#> 
#>                 Mean       SD  Naive SD Time-series SD      2.5%        50%
#> invest.01 -2.273e-01 0.109891 1.099e-03      1.132e-03 -0.442479 -2.280e-01
#> income.01  6.902e-07 0.003206 3.206e-05      3.206e-05 -0.006286 -9.111e-06
#> cons.01   -3.872e-05 0.003141 3.141e-05      3.141e-05 -0.006291 -1.351e-05
#> invest.02 -1.476e-04 0.003188 3.188e-05      3.188e-05 -0.006379 -1.627e-04
#> income.02  1.006e-05 0.003115 3.115e-05      3.115e-05 -0.006058  6.718e-06
#> cons.02    1.298e-05 0.003183 3.183e-05      3.183e-05 -0.006185  7.462e-06
#> invest.03  9.864e-05 0.003181 3.181e-05      3.181e-05 -0.006223  5.629e-05
#> income.03 -4.254e-05 0.003156 3.156e-05      3.238e-05 -0.006291 -1.891e-05
#> cons.03    2.170e-05 0.003148 3.148e-05      3.058e-05 -0.006153  1.975e-05
#> invest.04  3.267e-01 0.108329 1.083e-03      1.083e-03  0.113552  3.267e-01
#> income.04 -1.611e-05 0.003135 3.135e-05      3.135e-05 -0.006243  8.115e-06
#> cons.04    4.579e-05 0.003154 3.154e-05      3.076e-05 -0.006085  7.562e-05
#> const      1.519e-02 0.006069 6.069e-05      6.069e-05  0.003176  1.523e-02
#>               97.5%
#> invest.01 -0.013764
#> income.01  0.006310
#> cons.01    0.006061
#> invest.02  0.006064
#> income.02  0.006000
#> cons.02    0.006263
#> invest.03  0.006299
#> income.03  0.006131
#> cons.03    0.006159
#> invest.04  0.536211
#> income.04  0.006153
#> cons.04    0.006270
#> const      0.026959
#> 
#> Variable: income 
#> 
#>                 Mean       SD  Naive SD Time-series SD      2.5%        50%
#> invest.01  6.869e-04 0.003136 3.136e-05      3.193e-05 -0.005465  6.890e-04
#> income.01  1.075e-05 0.003137 3.137e-05      3.160e-05 -0.006210  2.738e-05
#> cons.01    1.151e-04 0.003147 3.147e-05      3.090e-05 -0.005923  1.105e-04
#> invest.02  6.772e-05 0.003097 3.097e-05      3.097e-05 -0.006038  1.310e-04
#> income.02  5.710e-05 0.003196 3.196e-05      3.401e-05 -0.006141  5.508e-05
#> cons.02   -1.453e-05 0.003160 3.160e-05      3.160e-05 -0.006246 -2.748e-05
#> invest.03  5.477e-05 0.003141 3.141e-05      3.141e-05 -0.006140  7.633e-05
#> income.03  1.774e-04 0.003156 3.156e-05      3.156e-05 -0.006066  1.634e-04
#> cons.03    3.008e-05 0.003135 3.135e-05      3.135e-05 -0.006256  4.382e-05
#> invest.04  2.947e-04 0.003174 3.174e-05      3.120e-05 -0.005938  3.048e-04
#> income.04 -4.550e-05 0.003163 3.163e-05      3.163e-05 -0.006330 -1.594e-05
#> cons.04   -3.258e-05 0.003163 3.163e-05      3.163e-05 -0.006236 -6.488e-05
#> const      2.014e-02 0.001479 1.479e-05      1.479e-05  0.017236  2.016e-02
#>              97.5%
#> invest.01 0.006849
#> income.01 0.006054
#> cons.01   0.006304
#> invest.02 0.006066
#> income.02 0.006299
#> cons.02   0.006191
#> invest.03 0.006201
#> income.03 0.006285
#> cons.03   0.006174
#> invest.04 0.006435
#> income.04 0.006075
#> cons.04   0.006140
#> const     0.023037
#> 
#> Variable: cons 
#> 
#>                 Mean       SD  Naive SD Time-series SD      2.5%        50%
#> invest.01 -5.275e-04 0.003134 3.134e-05      3.134e-05 -0.006716 -5.338e-04
#> income.01  2.618e-01 0.085979 8.598e-04      8.924e-04  0.094695  2.612e-01
#> cons.01   -4.362e-01 0.102607 1.026e-03      1.083e-03 -0.638762 -4.368e-01
#> invest.02  6.559e-04 0.003116 3.116e-05      3.116e-05 -0.005524  6.822e-04
#> income.02  3.289e-01 0.076682 7.668e-04      7.837e-04  0.176874  3.294e-01
#> cons.02   -8.815e-06 0.003175 3.175e-05      3.175e-05 -0.006188  6.246e-06
#> invest.03  8.475e-05 0.003139 3.139e-05      3.139e-05 -0.006145  1.200e-04
#> income.03  1.198e-04 0.003193 3.193e-05      3.193e-05 -0.006138  1.262e-04
#> cons.03    1.700e-04 0.003189 3.189e-05      3.286e-05 -0.006022  1.339e-04
#> invest.04 -5.389e-04 0.003125 3.125e-05      3.125e-05 -0.006711 -4.977e-04
#> income.04  1.594e-04 0.003161 3.161e-05      3.161e-05 -0.005997  1.459e-04
#> cons.04    4.326e-05 0.003131 3.131e-05      3.131e-05 -0.006195  7.535e-05
#> const      1.611e-02 0.002675 2.675e-05      2.783e-05  0.010777  1.613e-02
#>               97.5%
#> invest.01  0.005679
#> income.01  0.431811
#> cons.01   -0.233266
#> invest.02  0.006694
#> income.02  0.479725
#> cons.02    0.006226
#> invest.03  0.006270
#> income.03  0.006302
#> cons.03    0.006470
#> invest.04  0.005557
#> income.04  0.006426
#> cons.04    0.006137
#> const      0.021319
#> 
#> Variance-covariance matrix:
#> 
#>                    Mean        SD  Naive SD Time-series SD       2.5%       50%
#> invest_invest 2.091e-03 3.726e-04 3.726e-06      3.857e-06  1.487e-03 2.051e-03
#> invest_income 6.378e-05 7.091e-05 7.091e-07      7.335e-07 -7.152e-05 6.190e-05
#> invest_cons   1.383e-04 5.890e-05 5.890e-07      6.105e-07  3.342e-05 1.342e-04
#> income_invest 6.378e-05 7.091e-05 7.091e-07      7.335e-07 -7.152e-05 6.190e-05
#> income_income 1.529e-04 2.713e-05 2.713e-07      2.774e-07  1.081e-04 1.497e-04
#> income_cons   6.897e-05 1.762e-05 1.762e-07      1.868e-07  3.912e-05 6.726e-05
#> cons_invest   1.383e-04 5.890e-05 5.890e-07      6.105e-07  3.342e-05 1.342e-04
#> cons_income   6.897e-05 1.762e-05 1.762e-07      1.868e-07  3.912e-05 6.726e-05
#> cons_cons     9.637e-05 1.758e-05 1.758e-07      1.903e-07  6.789e-05 9.434e-05
#>                   97.5%
#> invest_invest 0.0029372
#> invest_income 0.0002109
#> invest_cons   0.0002659
#> income_invest 0.0002109
#> income_income 0.0002144
#> income_cons   0.0001085
#> cons_invest   0.0002659
#> cons_income   0.0001085
#> cons_cons     0.0001354

Forecasts, impulse responses and variance decompositions can be obtained in the usual manner.

See Koop and Korobilis (2010) for an introduction to Bayesian VAR modelling and SSVS.↩︎
This threshold value is usually set to 50 percent. 40 percent is chosen, because it yields similar results as the restricted model in Lütkepohl (2006, Section 5.2.10).↩︎

Stochastic Search Variable Selection in bvartools

Franz X. Mohr

2024-01-08

Introduction

Inference based on a user-written algorithm

Using the built-in simulation algorithm of `bvartools`

References

Stochastic Search Variable Selection in bvartools

Franz X. Mohr

2024-01-08

Introduction

Inference based on a user-written algorithm

Using the built-in simulation algorithm of bvartools

References

Using the built-in simulation algorithm of `bvartools`