[R-sig-ME] Questions on Penalization and Prior Variance in bglmer for data with Complete Separation

Fri Feb 7 16:56:49 CET 2025

> I couldn’t determine which method blme uses for penalization in bglmer.
Is it based on maximum likelihood, or does it use Markov-Chain Monte Carlo
(MCMC) methods?

>From the package description: "Maximum a posteriori estimation for linear
and generalized linear mixed-effects models in a Bayesian setting,
implementing the methods of Chung, et al. (2013)
<doi:10.1007/s11336-013-9328-2>".

So it applies the priors you specify and then maximizes the posterior.

> Additionally, is it acceptable to analyze the results of this model using
a frequentist approach?

Sort of? If your sample size is large then then the contributions of the
priors vanish and the algorithm will converge to the same place as maximum
likelihood (unless that fails to converge because the parameters are truly
at the boundary of the space). However, if you already have a well-defined
problem where the maximum likelihood estimate exists and is stable, you're
probably not going to be using blme. In that case, there really isn't a
coherent interpretation to the standard errors blme returns beyond a
measure of the curvature at the posterior mode, which corresponds to a
normal approximation. That generally isn't considered to be very
meaningful, because the posterior can be far from normal. For models where
lmer fails to converge, blme is intended to allow the researcher to rapidly
prototype solutions before fitting in an MCMC Bayesian setting where
posterior credible intervals can be obtained.

> what information should I consider to choose an appropriate variance
value?

I suppose that depends on your goal. If you're just trying to
penalize/regularize a model, then using a weakly informative prior based on
the built-in scale of the logistic function is sufficient (the default). If
you actually have substantive information you want to incorporate, then
treat your posterior uncertainty from that as your prior uncertainty in a
new model. However, in my mind that would largely apply if there is a
meta-analysis you can leverage or if the data you are analyzing are part of
a larger dataset that has been partially analyzed.

> What might be the cause of this issue?

Ind contains no repeated values, so the random effects can cause perfect
prediction. e.g.

random_intercepts <- ranef(m1)$Ind
cor(
  random_intercepts[match(Data$Ind,
as.integer(row.names(random_intercepts))),],
  Data$Response
)
[1] 0.9999599

You probably want to fit a glm instead for this example, which I believe
gives you the results you are expecting to see.

Cheers,
Vince

On Thu, Feb 6, 2025 at 8:50 PM Tomás Manuel Chialina <tmanuelch using gmail.com>
wrote:

> Hi everyone,
> I'm writing because I have a couple of questions about analyzing data with
> complete separation when fitting a Binomial GLMM. I've been following the
> suggestions made by Ben Bolker in this StackOverflow post<
> https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678>
> and in the glmmTMB FAQ<
> https://bbolker.github.io/mixedmodels-misc/ecostats_chap.html>.
> (You can generate data with the lines below)
> set.seed(123)
> # Number of observations per Trial
> n_29 <- 50
> n_30 <- 50
> Data <- data.frame(
>   Ind = 1:(n_29 + n_30),
>   Trial = rep(c(29, 30), times = c(n_29, n_30)),
>   Response = c(rep(1, n_29 * 0.76), rep(0, n_29 * 0.24),
>                    rep(1, n_30 * 1), rep(0, n_30 * 0))
> )
> I've been using the bglmer alternative (blme package) mentioned there,
> which involves including a prior on the fixed effects to penalize the
> parameter estimates. The model looks like this:
> m1 <- bglmer(Response ~ as.factor(Trial) + (1|Ind),
> family = binomial,
> fixef.prior = normal(cov = diag(8,2)),
> data = Data)
> (where 8 is the variance parameter and 2 the number of levels of the
> factor Trial).
> I have two main questions:
> Penalization Method in bglmer:
> I couldn’t determine which method blme uses for penalization in bglmer. Is
> it based on maximum likelihood, or does it use Markov-Chain Monte Carlo
> (MCMC) methods? I’m relatively new to Bayesian methods and have been
> interpreting the bglmer results similarly to how I interpret GLMMs in a
> frequentist framework (e.g., calculating p-values). However, I’m still
> uncertain about the specific method applied by bglmer. The package
> documentation doesn’t provide much detail on this aspect. Additionally, is
> it acceptable to analyze the results of this model using a frequentist
> approach?
> Choosing the Prior Variance:
> Regarding the specification of the prior in bglmer function (e.g.,
> fixef.prior = normal(cov = diag(variance, n_fixed_effects)) ), what
> information should I consider to choose an appropriate variance value?
> Would it be reasonable to use as a reference the standard error of a
> parameter estimate not affected by complete separation in the non-penalized
> GLMM (e.g., using glmmTMB function)? For example, given the summary of the
> following non-penalized model:
> m1NonPenalized <- glmmTMB(Response ~ as.factor(Trial) +(1|Ind),
>                 family = binomial,
>                 data = DataM1E2)
>
> summary(m1NonPenalized)
> Conditional model:
>                     Estimate Std. Error z value Pr(>|z|)
> (Intercept)        1.153e+00  4.958e-01   2.325   0.0201 *
> as.factor(Trial)30 3.338e+01  4.458e+06   0.000   1.0000
> Here, we observe that Trial 30 is affected by separation as its standard
> error is extremely large, whereas the intercept (Trial 29) is not. Would
> the standard error of Trial 29  serve as a reference for choosing an
> appropriate variance value (i.e., as the std.error is 0.49, a variance
> value of 1 or higher would be reasonable)?
> Additionally, I encountered an issue with the estimation of probabilities
> in the model fitted with bglmer (m1). You can generate an histogram to
> inspect the observed frequencies of data in order to follow the discussion
> below:
> hist <- Data %>%  ggplot(aes(x = as.factor(Response))) +
> geom_bar(aes(y = after_stat(prop) * 100,group = Trial, fill = Trial),
> colour = "black") +
>   scale_y_continuous(limits = c(0,100), n.breaks = 10) +
>   ylab("% of responses") +
>   xlab("Response") +
>   facet_grid(. ~ Trial)
> While the observed probabilities of response were around 0.76  for Trial
> 29 and 1 for Trial 30, the model estimated probabilities of approximately
> 0.999 for both, with incredibly small standard errors (much smaller than
> expected given the data variability):
>
> Trial    prob         SE  df asymp.LCL asymp.UCL
>
>  29    0.99985 2.1918e-04 Inf   0.99747   0.99999
>
>  30    0.99998 3.3708e-05 Inf   0.99921   1.00000
>
> I noticed that reducing the variance value from 8 to 1 inside fixef.prior
> produced estimates closer to the observed values, as you can see below:
>
> Trial  prob     SE  df asymp.LCL asymp.UCL
>
>  29    0.828 0.0769 Inf     0.626     0.933
>
>  30    0.973 0.0195 Inf     0.893     0.994
>
> What might be the cause of this issue?
> Thank you very much in advance for your time and help!
> Cheers,
> Tomás
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]