[R-sig-ME] [EXT] Re: Choice of distribution for random effects

Fri Jul 22 00:11:26 CEST 2022

As noted earlier, I’m not familiar with hglm, but a cursory glance suggests that those are not the same model.  You appear to have specified Beta distributed random effects in the call to hglm2.

Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 22 Jul 2022, 7:45 AM +1000, J.D. Haltigan <jhaltiga using gmail.com>, wrote:
External email: Please exercise caution

________________________________
Thanks for this continued deep dive as it is very instructive for me. I am currently also reading the Wood GAM book which is very helpful.

As an aside, and perhaps relevant to the link discourse, in the current project I am working on, I fit an lmer with:

***lme4_1_B = lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union), data = bdata.raw1)#lme4 package***
which converges and provides results that are sensible (compared to models that have been run using fixed effects only).

However, when I try to use HGLM to fit this model, it does not want to cooperate and I am wondering why. If I specify the HGLM as should be the case for linear models with family = Guassian, as:

***HGLM2_1_A = hglm2(posXsymp~ treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union), family=gaussian(link=identity),
                  rand.family=Beta(link=logit), maxit = 100, data = bdata.raw1)#HGLM package using hglm2***

I get: Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
  NA/NaN/Inf in 'x'
In addition: Warning message:
In hglm.default(X = X, y = Y, Z = Z, family = family, rand.family = rand.family,  :
  Residuals numerically 0 are replaced by 1e-8
>

I am wondering if this is b/c "pairID" is a factor variable. I simply don't understand why lmer would fit this but HGLM returns that error.

On Thu, Jul 21, 2022 at 4:18 PM Andrew Robinson <apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>> wrote:
And, with my apologies for being fastidious, as you say the family typically specifies the link function *in the software*, selecting the (so-called) canonical link.

That doesn’t make it the right link function for any given modeling exercise. See, e.g., https://aip.scitation.org/doi/pdf/10.1063/1.5139815<https://aip.scitation.org/doi/pdf/10.1063/1.5139815>, and particularly Figure 1.

As Joe Hilbe and I wrote, it is not clear that there is any statistical benefit in the canonization of a particular link function for each family.

Cheers,

Andrew
On 22 Jul 2022, 2:04 AM +1000, Ben Bolker <bbolker using gmail.com<mailto:bbolker using gmail.com>>, wrote:
I want to clarify one tiny thing.

The family typically specifies the *link function* as well as the
conditional distribution, although this is often done by default (e.g.
poisson -> log, binomial -> logit/log-odds). The link function
specifies the scale on which the data are fitted, so although the random
effects are always Gaussian *on the scale of the linear predictor*
(i.e., the transformed scale), they vary on the scale of the data. If
the distribution of group-level intercepts is Gaussian on the log scale,
then it's log-Normal on the data (count) scale.

On 2022-07-20 11:36 p.m., Andrew Robinson wrote:
That is correct.

Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 21 Jul 2022, 1:04 PM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:jhaltiga using gmail.com>>, wrote:
This cross-validated post seemed helpful and aligned with everything
previously mentioned here, and I wanted to share.
https://stats.stackexchange.com/questions/190763/how-to-decide-which-glm-family-to-use<https://stats.stackexchange.com/questions/190763/how-to-decide-which-glm-family-to-use>

Thanks for all of your insights as this is quite a deep dive for me.

To be sure: in glmer, when one specifies the GLM family, that then applies
to both fixed & random effects, correct? In other words, there is no way to
separately specify a distribution for the random effects in glmer? I know I
asked this question to Ben on the lme4 Git, but I wanted to confirm again.

JD

On Wed, Jul 20, 2022 at 8:35 PM Andrew Robinson <apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>> wrote:

Ben is correct that the traditional alignment of certain link functions
with certain members of the exponential family (the so-called 'canonical'
links) are merely for analytical convenience and have no statistical
justification.

Canonical link functions exercise an unhealthy influence on statistical
practice, IMO. OTOH, there are circumstances in which they may afford
parameter estimates that are easier to interpret.

I'm not familiar with hglm but I suspect you may need to use the log-link
function for the random effects in order to get something like the glmer
output.

Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my
respects to their Elders.
On 21 Jul 2022, 10:24 AM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:jhaltiga using gmail.com>>, wrote:

Thanks, Ben. For example, here is a comparison of glmer & HGLM using
Guassian random effects for HGLM as suggested by Andrew. The magnitude of
the point estimates are about the same, but the nominal significance for
'treatment' is only significant when considering the Gaussian random
effects from lmer.

NB: In the project I am working on, the point is to show sensitivity of
results to different parameterizations of models. I understand the point
estimate magnitudes are about the same, but that is less relevant to this
exercise.

*lme4_5_B = glmer(posXsymp~ treatment+proper_mask_base+prop_resp_ill_base_2
+ pairID + (1 | union), family = "poisson", nAGQ=0, data = bdata.raw3)#lme4
package using glmer*

summary(lme4_5_B)

Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
Family: poisson ( log )
Formula: posXsymp ~ treatment + proper_mask_base + prop_resp_ill_base_2 +
pairID + (1 | union)
Data: bdata.raw3

AIC BIC logLik deviance df.resid
25054.5 27939.7 -12254.2 24508.5 287075

Scaled residuals:
Min 1Q Median 3Q Max
-0.214597 -0.100237 -0.078562 -0.054499 40.646525

Random effects:
Groups Name Variance Std.Dev.
union (Intercept) 0.007269442 0.08526102
Number of obs: 287348, groups: union, 538

Fixed effects:
Estimate Std. Error z value
Pr(>|z|)
(Intercept) -5.987724526 0.581280589 -10.30092 <
0.000000000000000222 ***
treatment -0.094657198 0.044546951 -2.12489
0.03359612 *

***********************************************************************************************************************************************************************************************************************************

*HGLM2_5_Test = hglm2(posXsymp~
treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union),
family =poisson (link = log), rand.family =gaussian(link=identity),
data = bdata.raw3)#HGLM package using hglm2*

summary(HGLM2_5_Test)

Call:
hglm2.formula(meanmodel = posXsymp ~ treatment + proper_mask_base +
prop_resp_ill_base_2 + pairID + (1 | union), data = bdata.raw3,
family = poisson(link = log), rand.family = gaussian(link = identity))

----------
MEAN MODEL
----------

Summary of the fixed effects estimates:

Estimate Std. Error t-value Pr(>|t|)
(Intercept) -6.02638 1.09414 -5.508 0.0000000364 ***
treatment -0.02977 0.13624 -0.218 0.8271

On Wed, Jul 20, 2022 at 10:42 AM Ben Bolker <bbolker using gmail.com<mailto:bbolker using gmail.com>> wrote:

I believe that there are some mathematically natural pairings (i.e. a
Gamma random effect + a Poisson response, a Beta-distributed random
effect + a binomial response), but I don't know if there's much
theoretical justification other than analytical convenience.

(If you have a categorical response then the natural random effect
would be Dirichlet, i.e. a Dirichlet-multinomial marginal distribution,
but do you really want to go down that rabbit hole?)

I strongly second Andrew's recommendation to check that the
difference is not a difference between packages/estimation procedures.

On 2022-07-20 5:26 a.m., Andrew Robinson wrote:

You should really also verify that the hglm is doing what you expect by

fitting with Gaussian random effects.

The random effects in glmer are Gaussian. The effects enter the model

via the linear predictor, which is then translated to the mean function of
the chosen distribution of the response variable via the link function (NB
this is a conceptual explanation, not an algorithmic one).

Cheers,

Andrew
On 20 Jul 2022, 6:40 PM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:jhaltiga using gmail.com>>,

wrote:

External email: Please exercise caution

________________________________
Thanks, I will inspect the BLUPS.

Re: choice of distribution, what I meant was, for example, if my fixed

effects family is estimated using a Poisson model, does that inform the
choice for random effects as well? (i.e., why would one invoke a gaussian
distribution for random effects if the response variable is, say,
categorical, or a count?)

On Wed, Jul 20, 2022 at 3:40 AM Andrew Robinson <apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>

<mailto:apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>>> wrote:

You can use a qq-plot of the BLUPS to guide that decision.

The difference might also be due to something else, though. Have you

tried the call to hglm2 with a gaussian random effects family? Does it
give the same output as the glmer?

For: does the choice of distribution for the fixed effects portion of

the model inform the choice for the random effects? I’m not sure what you
mean - do you mean the exponential family for the response variable?

Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au><mailto:apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>>
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my

respects to their Elders.

On 20 Jul 2022, 2:27 PM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:jhaltiga using gmail.com><mailto:

jhaltiga using gmail.com<mailto:jhaltiga using gmail.com>>>, wrote:

Hi:

Is there clear best practice or guidance when it comes to choosing the
distribution of random effects where multiple choices exist (e.g.,
gaussian, gamma, etc.)? I ask in the context of extending some analyses
from an RCT in which the outcome is symptomatic seropositivity (so a
count). The random effects I am modeling are village cluster [union]

(it's

a cluster randomized trial). I get different results (significance-wise)
depending on whether I choose a normal or gamma distribution for the

random

effects.

The basic model (proportional outcome) is:

lme4_5_B = glmer(posXsymp~

treatment+proper_mask_base+prop_resp_ill_base_2

+ pairID + (1 | union), family = "poisson", nAGQ=0, data =

bdata.raw3)#lme4

package using glmer

HGLM2_5_A = hglm2(posXsymp~

treatment+proper_mask_base+prop_resp_ill_base_2

+ pairID + (1 | union), family =poisson (link = log), rand.family
=Gamma(link=log),
data = bdata.raw3)#HGLM package using hglm2

Does, for example, the choice of distribution for the fixed effects

portion

of the model inform the choice for the random effects?

Thank you for any insights.

Best regards,
J.D.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org><mailto:R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org>>

mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models><

https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>

	[[alternative HTML version deleted]]