[R-sig-ME] [EXT] Re: Choice of distribution for random effects

Thu Jul 21 18:03:32 CEST 2022

   I want to clarify one tiny thing.

   The family typically specifies the *link function* as well as the 
conditional distribution, although this is often done by default (e.g. 
poisson -> log, binomial -> logit/log-odds).  The link function 
specifies the scale on which the data are fitted, so although the random 
effects are always Gaussian *on the scale of the linear predictor* 
(i.e., the transformed scale), they vary on the scale of the data.  If 
the distribution of group-level intercepts is Gaussian on the log scale, 
then it's log-Normal on the data (count) scale.

On 2022-07-20 11:36 p.m., Andrew Robinson wrote:
> That is correct.
> 
> Cheers,
> 
> Andrew
> 
> --
> Andrew Robinson
> Chief Executive Officer, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
> 
> I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
> On 21 Jul 2022, 1:04 PM +1000, J.D. Haltigan <jhaltiga using gmail.com>, wrote:
> This cross-validated post seemed helpful and aligned with everything
> previously mentioned here, and I wanted to share.
> https://stats.stackexchange.com/questions/190763/how-to-decide-which-glm-family-to-use
> 
> Thanks for all of your insights as this is quite a deep dive for me.
> 
> To be sure: in glmer, when one specifies the GLM family, that then applies
> to both fixed & random effects, correct? In other words, there is no way to
> separately specify a distribution for the random effects in glmer? I know I
> asked this question to Ben on the lme4 Git, but I wanted to confirm again.
> 
> JD
> 
> On Wed, Jul 20, 2022 at 8:35 PM Andrew Robinson <apro using unimelb.edu.au> wrote:
> 
> Ben is correct that the traditional alignment of certain link functions
> with certain members of the exponential family (the so-called 'canonical'
> links) are merely for analytical convenience and have no statistical
> justification.
> 
> Canonical link functions exercise an unhealthy influence on statistical
> practice, IMO. OTOH, there are circumstances in which they may afford
> parameter estimates that are easier to interpret.
> 
> I'm not familiar with hglm but I suspect you may need to use the log-link
> function for the random effects in order to get something like the glmer
> output.
> 
> Cheers,
> 
> Andrew
> 
> --
> Andrew Robinson
> Chief Executive Officer, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
> 
> I acknowledge the Traditional Owners of the land I inhabit, and pay my
> respects to their Elders.
> On 21 Jul 2022, 10:24 AM +1000, J.D. Haltigan <jhaltiga using gmail.com>, wrote:
> 
> Thanks, Ben. For example, here is a comparison of glmer & HGLM using
> Guassian random effects for HGLM as suggested by Andrew. The magnitude of
> the point estimates are about the same, but the nominal significance for
> 'treatment' is only significant when considering the Gaussian random
> effects from lmer.
> 
> NB: In the project I am working on, the point is to show sensitivity of
> results to different parameterizations of models. I understand the point
> estimate magnitudes are about the same, but that is less relevant to this
> exercise.
> 
> *lme4_5_B = glmer(posXsymp~ treatment+proper_mask_base+prop_resp_ill_base_2
> + pairID + (1 | union), family = "poisson", nAGQ=0, data = bdata.raw3)#lme4
> package using glmer*
> 
> summary(lme4_5_B)
> 
> Generalized linear mixed model fit by maximum likelihood (Adaptive
> Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
> Family: poisson ( log )
> Formula: posXsymp ~ treatment + proper_mask_base + prop_resp_ill_base_2 +
> pairID + (1 | union)
> Data: bdata.raw3
> 
> AIC BIC logLik deviance df.resid
> 25054.5 27939.7 -12254.2 24508.5 287075
> 
> Scaled residuals:
> Min 1Q Median 3Q Max
> -0.214597 -0.100237 -0.078562 -0.054499 40.646525
> 
> Random effects:
> Groups Name Variance Std.Dev.
> union (Intercept) 0.007269442 0.08526102
> Number of obs: 287348, groups: union, 538
> 
> Fixed effects:
> Estimate Std. Error z value
> Pr(>|z|)
> (Intercept) -5.987724526 0.581280589 -10.30092 <
> 0.000000000000000222 ***
> treatment -0.094657198 0.044546951 -2.12489
> 0.03359612 *
> 
> 
> ***********************************************************************************************************************************************************************************************************************************
> 
> *HGLM2_5_Test = hglm2(posXsymp~
> treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union),
> family =poisson (link = log), rand.family =gaussian(link=identity),
> data = bdata.raw3)#HGLM package using hglm2*
> 
> summary(HGLM2_5_Test)
> 
> Call:
> hglm2.formula(meanmodel = posXsymp ~ treatment + proper_mask_base +
> prop_resp_ill_base_2 + pairID + (1 | union), data = bdata.raw3,
> family = poisson(link = log), rand.family = gaussian(link = identity))
> 
> ----------
> MEAN MODEL
> ----------
> 
> Summary of the fixed effects estimates:
> 
> Estimate Std. Error t-value Pr(>|t|)
> (Intercept) -6.02638 1.09414 -5.508 0.0000000364 ***
> treatment -0.02977 0.13624 -0.218 0.8271
> 
> On Wed, Jul 20, 2022 at 10:42 AM Ben Bolker <bbolker using gmail.com> wrote:
> 
> I believe that there are some mathematically natural pairings (i.e. a
> Gamma random effect + a Poisson response, a Beta-distributed random
> effect + a binomial response), but I don't know if there's much
> theoretical justification other than analytical convenience.
> 
> (If you have a categorical response then the natural random effect
> would be Dirichlet, i.e. a Dirichlet-multinomial marginal distribution,
> but do you really want to go down that rabbit hole?)
> 
> I strongly second Andrew's recommendation to check that the
> difference is not a difference between packages/estimation procedures.
> 
> On 2022-07-20 5:26 a.m., Andrew Robinson wrote:
> 
> You should really also verify that the hglm is doing what you expect by
> 
> fitting with Gaussian random effects.
> 
> 
> The random effects in glmer are Gaussian. The effects enter the model
> 
> via the linear predictor, which is then translated to the mean function of
> the chosen distribution of the response variable via the link function (NB
> this is a conceptual explanation, not an algorithmic one).
> 
> 
> Cheers,
> 
> Andrew
> On 20 Jul 2022, 6:40 PM +1000, J.D. Haltigan <jhaltiga using gmail.com>,
> 
> wrote:
> 
> External email: Please exercise caution
> 
> ________________________________
> Thanks, I will inspect the BLUPS.
> 
> Re: choice of distribution, what I meant was, for example, if my fixed
> 
> effects family is estimated using a Poisson model, does that inform the
> choice for random effects as well? (i.e., why would one invoke a gaussian
> distribution for random effects if the response variable is, say,
> categorical, or a count?)
> 
> 
> On Wed, Jul 20, 2022 at 3:40 AM Andrew Robinson <apro using unimelb.edu.au
> 
> <mailto:apro using unimelb.edu.au>> wrote:
> 
> You can use a qq-plot of the BLUPS to guide that decision.
> 
> The difference might also be due to something else, though. Have you
> 
> tried the call to hglm2 with a gaussian random effects family? Does it
> give the same output as the glmer?
> 
> 
> For: does the choice of distribution for the fixed effects portion of
> 
> the model inform the choice for the random effects? I’m not sure what you
> mean - do you mean the exponential family for the response variable?
> 
> 
> 
> Cheers,
> 
> Andrew
> 
> --
> Andrew Robinson
> Chief Executive Officer, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
> 
> I acknowledge the Traditional Owners of the land I inhabit, and pay my
> 
> respects to their Elders.
> 
> On 20 Jul 2022, 2:27 PM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:
> 
> jhaltiga using gmail.com>>, wrote:
> 
> Hi:
> 
> Is there clear best practice or guidance when it comes to choosing the
> distribution of random effects where multiple choices exist (e.g.,
> gaussian, gamma, etc.)? I ask in the context of extending some analyses
> from an RCT in which the outcome is symptomatic seropositivity (so a
> count). The random effects I am modeling are village cluster [union]
> 
> (it's
> 
> a cluster randomized trial). I get different results (significance-wise)
> depending on whether I choose a normal or gamma distribution for the
> 
> random
> 
> effects.
> 
> The basic model (proportional outcome) is:
> 
> lme4_5_B = glmer(posXsymp~
> 
> treatment+proper_mask_base+prop_resp_ill_base_2
> 
> + pairID + (1 | union), family = "poisson", nAGQ=0, data =
> 
> bdata.raw3)#lme4
> 
> package using glmer
> 
> 
> HGLM2_5_A = hglm2(posXsymp~
> 
> treatment+proper_mask_base+prop_resp_ill_base_2
> 
> + pairID + (1 | union), family =poisson (link = log), rand.family
> =Gamma(link=log),
> data = bdata.raw3)#HGLM package using hglm2
> 
> Does, for example, the choice of distribution for the fixed effects
> 
> portion
> 
> of the model inform the choice for the random effects?
> 
> Thank you for any insights.
> 
> Best regards,
> J.D.
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org>
> 
> mailing list
> 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<
> 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics