[R-sig-ME] [EXT] Re: Choice of distribution for random effects
J.D. Haltigan
jh@|t|g@ @end|ng |rom gm@||@com
Fri Jul 22 00:51:46 CEST 2022
Thanks for the clarification. It is reassuring at this point to know that
the fixed portion 'of the model(s)' is the same.
On Thu, Jul 21, 2022 at 6:48 PM Andrew Robinson <apro using unimelb.edu.au> wrote:
> Well, earlier you wrote "when I try to use HGLM to fit this model”. It’s
> not the same model. The fixed portion is the same but the random portion is
> not. If you want to know why hglm2 is failing to fit your different model,
> that’s a different question. Possible the families that you elect are not
> compatible with the data.
>
> I don’t have an opinion on the question because it is not framed in a
> language that is familiar to me.
>
> Andrew
>
> --
> Andrew Robinson
> Chief Executive Officer, CEBRA and Professor of Biosecurity,
> School/s of BioSciences and Mathematics & Statistics
> University of Melbourne, VIC 3010 Australia
> Tel: (+61) 0403 138 955
> Email: apro using unimelb.edu.au
> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>
> I acknowledge the Traditional Owners of the land I inhabit, and pay my
> respects to their Elders.
> On 22 Jul 2022, 8:36 AM +1000, J.D. Haltigan <jhaltiga using gmail.com>, wrote:
>
> *External email:* Please exercise caution
> ------------------------------
> Correct. The idea was to replicate the lmer *with the addition of a random
> effects distribution* which is not possible in lmer (to see what numeric
> differences in estimate effects might arise).
>
> So, the fixed portion of the model, if I understand lmer correctly, should
> be the same here. Whether I specify Beta or Gamma for the random effects
> (in hglm2) yields the same error. I am wondering what might be the reason
> for this.
>
> At the risk of asking an elementary question here: When I add the random
> effects specification for this model, I understand it as accounting for the
> B/W cluster variance (union). In effect, by adding this random effects
> component, I am saying that unions are *not* interchangeable and are drawn
> from a universe of random draws of unions. I am borrowing a bit from the
> parlance of Generalizability Theory, so apologies in advance if my language
> is a bit cryptic.
>
> JD
>
> On Thu, Jul 21, 2022 at 6:11 PM Andrew Robinson <apro using unimelb.edu.au>
> wrote:
>
>> As noted earlier, I’m not familiar with hglm, but a cursory glance
>> suggests that those are not the same model. You appear to have specified
>> Beta distributed random effects in the call to hglm2.
>>
>> Cheers,
>>
>> Andrew
>>
>> --
>> Andrew Robinson
>> Chief Executive Officer, CEBRA and Professor of Biosecurity,
>> School/s of BioSciences and Mathematics & Statistics
>> University of Melbourne, VIC 3010 Australia
>> Tel: (+61) 0403 138 955
>> Email: apro using unimelb.edu.au
>> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>>
>> I acknowledge the Traditional Owners of the land I inhabit, and pay my
>> respects to their Elders.
>> On 22 Jul 2022, 7:45 AM +1000, J.D. Haltigan <jhaltiga using gmail.com>, wrote:
>>
>> *External email:* Please exercise caution
>> ------------------------------
>> Thanks for this continued deep dive as it is very instructive for me. I
>> am currently also reading the Wood GAM book which is very helpful.
>>
>> As an aside, and perhaps relevant to the link discourse, in the current
>> project I am working on, I fit an lmer with:
>>
>> ***lme4_1_B =
>> lmer(posXsymp~treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1
>> | union), data = bdata.raw1)#lme4 package***
>> which converges and provides results that are sensible (compared to
>> models that have been run using fixed effects only).
>>
>> However, when I try to use HGLM to fit this model, it does not want to
>> cooperate and I am wondering why. If I specify the HGLM as should be the
>> case for linear models with family = Guassian, as:
>>
>> ***HGLM2_1_A = hglm2(posXsymp~
>> treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union),
>> family=gaussian(link=identity),
>> rand.family=Beta(link=logit), maxit = 100, data =
>> bdata.raw1)#HGLM package using hglm2***
>>
>> I get: Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>> 1, 1, :
>> NA/NaN/Inf in 'x'
>> In addition: Warning message:
>> In hglm.default(X = X, y = Y, Z = Z, family = family, rand.family =
>> rand.family, :
>> Residuals numerically 0 are replaced by 1e-8
>> >
>>
>> I am wondering if this is b/c "pairID" is a factor variable. I simply
>> don't understand why lmer would fit this but HGLM returns that error.
>>
>> On Thu, Jul 21, 2022 at 4:18 PM Andrew Robinson <apro using unimelb.edu.au>
>> wrote:
>>
>>> And, with my apologies for being fastidious, as you say the family
>>> typically specifies the link function *in the software*, selecting the
>>> (so-called) canonical link.
>>>
>>> That doesn’t make it the right link function for any given modeling
>>> exercise. See, e.g., https://aip.scitation.org/doi/pdf/10.1063/1.5139815,
>>> and particularly Figure 1.
>>>
>>> As Joe Hilbe and I wrote, it is not clear that there is any statistical
>>> benefit in the canonization of a particular link function for each family.
>>>
>>> Cheers,
>>>
>>> Andrew
>>> On 22 Jul 2022, 2:04 AM +1000, Ben Bolker <bbolker using gmail.com>, wrote:
>>> I want to clarify one tiny thing.
>>>
>>> The family typically specifies the *link function* as well as the
>>> conditional distribution, although this is often done by default (e.g.
>>> poisson -> log, binomial -> logit/log-odds). The link function
>>> specifies the scale on which the data are fitted, so although the random
>>> effects are always Gaussian *on the scale of the linear predictor*
>>> (i.e., the transformed scale), they vary on the scale of the data. If
>>> the distribution of group-level intercepts is Gaussian on the log scale,
>>> then it's log-Normal on the data (count) scale.
>>>
>>>
>>> On 2022-07-20 11:36 p.m., Andrew Robinson wrote:
>>> That is correct.
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>> --
>>> Andrew Robinson
>>> Chief Executive Officer, CEBRA and Professor of Biosecurity,
>>> School/s of BioSciences and Mathematics & Statistics
>>> University of Melbourne, VIC 3010 Australia
>>> Tel: (+61) 0403 138 955
>>> Email: apro using unimelb.edu.au
>>> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>>>
>>> I acknowledge the Traditional Owners of the land I inhabit, and pay my
>>> respects to their Elders.
>>> On 21 Jul 2022, 1:04 PM +1000, J.D. Haltigan <jhaltiga using gmail.com>,
>>> wrote:
>>> This cross-validated post seemed helpful and aligned with everything
>>> previously mentioned here, and I wanted to share.
>>>
>>> https://stats.stackexchange.com/questions/190763/how-to-decide-which-glm-family-to-use
>>>
>>> Thanks for all of your insights as this is quite a deep dive for me.
>>>
>>> To be sure: in glmer, when one specifies the GLM family, that then
>>> applies
>>> to both fixed & random effects, correct? In other words, there is no way
>>> to
>>> separately specify a distribution for the random effects in glmer? I
>>> know I
>>> asked this question to Ben on the lme4 Git, but I wanted to confirm
>>> again.
>>>
>>> JD
>>>
>>> On Wed, Jul 20, 2022 at 8:35 PM Andrew Robinson <apro using unimelb.edu.au>
>>> wrote:
>>>
>>> Ben is correct that the traditional alignment of certain link functions
>>> with certain members of the exponential family (the so-called 'canonical'
>>> links) are merely for analytical convenience and have no statistical
>>> justification.
>>>
>>> Canonical link functions exercise an unhealthy influence on statistical
>>> practice, IMO. OTOH, there are circumstances in which they may afford
>>> parameter estimates that are easier to interpret.
>>>
>>> I'm not familiar with hglm but I suspect you may need to use the log-link
>>> function for the random effects in order to get something like the glmer
>>> output.
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>> --
>>> Andrew Robinson
>>> Chief Executive Officer, CEBRA and Professor of Biosecurity,
>>> School/s of BioSciences and Mathematics & Statistics
>>> University of Melbourne, VIC 3010 Australia
>>> Tel: (+61) 0403 138 955
>>> Email: apro using unimelb.edu.au
>>> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>>>
>>> I acknowledge the Traditional Owners of the land I inhabit, and pay my
>>> respects to their Elders.
>>> On 21 Jul 2022, 10:24 AM +1000, J.D. Haltigan <jhaltiga using gmail.com>,
>>> wrote:
>>>
>>> Thanks, Ben. For example, here is a comparison of glmer & HGLM using
>>> Guassian random effects for HGLM as suggested by Andrew. The magnitude of
>>> the point estimates are about the same, but the nominal significance for
>>> 'treatment' is only significant when considering the Gaussian random
>>> effects from lmer.
>>>
>>> NB: In the project I am working on, the point is to show sensitivity of
>>> results to different parameterizations of models. I understand the point
>>> estimate magnitudes are about the same, but that is less relevant to this
>>> exercise.
>>>
>>> *lme4_5_B = glmer(posXsymp~
>>> treatment+proper_mask_base+prop_resp_ill_base_2
>>> + pairID + (1 | union), family = "poisson", nAGQ=0, data =
>>> bdata.raw3)#lme4
>>> package using glmer*
>>>
>>> summary(lme4_5_B)
>>>
>>> Generalized linear mixed model fit by maximum likelihood (Adaptive
>>> Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
>>> Family: poisson ( log )
>>> Formula: posXsymp ~ treatment + proper_mask_base + prop_resp_ill_base_2 +
>>> pairID + (1 | union)
>>> Data: bdata.raw3
>>>
>>> AIC BIC logLik deviance df.resid
>>> 25054.5 27939.7 -12254.2 24508.5 287075
>>>
>>> Scaled residuals:
>>> Min 1Q Median 3Q Max
>>> -0.214597 -0.100237 -0.078562 -0.054499 40.646525
>>>
>>> Random effects:
>>> Groups Name Variance Std.Dev.
>>> union (Intercept) 0.007269442 0.08526102
>>> Number of obs: 287348, groups: union, 538
>>>
>>> Fixed effects:
>>> Estimate Std. Error z value
>>> Pr(>|z|)
>>> (Intercept) -5.987724526 0.581280589 -10.30092 <
>>> 0.000000000000000222 ***
>>> treatment -0.094657198 0.044546951 -2.12489
>>> 0.03359612 *
>>>
>>>
>>>
>>> ***********************************************************************************************************************************************************************************************************************************
>>>
>>> *HGLM2_5_Test = hglm2(posXsymp~
>>> treatment+proper_mask_base+prop_resp_ill_base_2 + pairID + (1 | union),
>>> family =poisson (link = log), rand.family =gaussian(link=identity),
>>> data = bdata.raw3)#HGLM package using hglm2*
>>>
>>> summary(HGLM2_5_Test)
>>>
>>> Call:
>>> hglm2.formula(meanmodel = posXsymp ~ treatment + proper_mask_base +
>>> prop_resp_ill_base_2 + pairID + (1 | union), data = bdata.raw3,
>>> family = poisson(link = log), rand.family = gaussian(link = identity))
>>>
>>> ----------
>>> MEAN MODEL
>>> ----------
>>>
>>> Summary of the fixed effects estimates:
>>>
>>> Estimate Std. Error t-value Pr(>|t|)
>>> (Intercept) -6.02638 1.09414 -5.508 0.0000000364 ***
>>> treatment -0.02977 0.13624 -0.218 0.8271
>>>
>>> On Wed, Jul 20, 2022 at 10:42 AM Ben Bolker <bbolker using gmail.com> wrote:
>>>
>>> I believe that there are some mathematically natural pairings (i.e. a
>>> Gamma random effect + a Poisson response, a Beta-distributed random
>>> effect + a binomial response), but I don't know if there's much
>>> theoretical justification other than analytical convenience.
>>>
>>> (If you have a categorical response then the natural random effect
>>> would be Dirichlet, i.e. a Dirichlet-multinomial marginal distribution,
>>> but do you really want to go down that rabbit hole?)
>>>
>>> I strongly second Andrew's recommendation to check that the
>>> difference is not a difference between packages/estimation procedures.
>>>
>>> On 2022-07-20 5:26 a.m., Andrew Robinson wrote:
>>>
>>> You should really also verify that the hglm is doing what you expect by
>>>
>>> fitting with Gaussian random effects.
>>>
>>>
>>> The random effects in glmer are Gaussian. The effects enter the model
>>>
>>> via the linear predictor, which is then translated to the mean function
>>> of
>>> the chosen distribution of the response variable via the link function
>>> (NB
>>> this is a conceptual explanation, not an algorithmic one).
>>>
>>>
>>> Cheers,
>>>
>>> Andrew
>>> On 20 Jul 2022, 6:40 PM +1000, J.D. Haltigan <jhaltiga using gmail.com>,
>>>
>>> wrote:
>>>
>>> External email: Please exercise caution
>>>
>>> ________________________________
>>> Thanks, I will inspect the BLUPS.
>>>
>>> Re: choice of distribution, what I meant was, for example, if my fixed
>>>
>>> effects family is estimated using a Poisson model, does that inform the
>>> choice for random effects as well? (i.e., why would one invoke a gaussian
>>> distribution for random effects if the response variable is, say,
>>> categorical, or a count?)
>>>
>>>
>>> On Wed, Jul 20, 2022 at 3:40 AM Andrew Robinson <apro using unimelb.edu.au
>>>
>>> <mailto:apro using unimelb.edu.au>> wrote:
>>>
>>> You can use a qq-plot of the BLUPS to guide that decision.
>>>
>>> The difference might also be due to something else, though. Have you
>>>
>>> tried the call to hglm2 with a gaussian random effects family? Does it
>>> give the same output as the glmer?
>>>
>>>
>>> For: does the choice of distribution for the fixed effects portion of
>>>
>>> the model inform the choice for the random effects? I’m not sure what you
>>> mean - do you mean the exponential family for the response variable?
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Andrew
>>>
>>> --
>>> Andrew Robinson
>>> Chief Executive Officer, CEBRA and Professor of Biosecurity,
>>> School/s of BioSciences and Mathematics & Statistics
>>> University of Melbourne, VIC 3010 Australia
>>> Tel: (+61) 0403 138 955
>>> Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>
>>> Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
>>>
>>> I acknowledge the Traditional Owners of the land I inhabit, and pay my
>>>
>>> respects to their Elders.
>>>
>>> On 20 Jul 2022, 2:27 PM +1000, J.D. Haltigan <jhaltiga using gmail.com<mailto:
>>>
>>> jhaltiga using gmail.com>>, wrote:
>>>
>>> Hi:
>>>
>>> Is there clear best practice or guidance when it comes to choosing the
>>> distribution of random effects where multiple choices exist (e.g.,
>>> gaussian, gamma, etc.)? I ask in the context of extending some analyses
>>> from an RCT in which the outcome is symptomatic seropositivity (so a
>>> count). The random effects I am modeling are village cluster [union]
>>>
>>> (it's
>>>
>>> a cluster randomized trial). I get different results (significance-wise)
>>> depending on whether I choose a normal or gamma distribution for the
>>>
>>> random
>>>
>>> effects.
>>>
>>> The basic model (proportional outcome) is:
>>>
>>> lme4_5_B = glmer(posXsymp~
>>>
>>> treatment+proper_mask_base+prop_resp_ill_base_2
>>>
>>> + pairID + (1 | union), family = "poisson", nAGQ=0, data =
>>>
>>> bdata.raw3)#lme4
>>>
>>> package using glmer
>>>
>>>
>>> HGLM2_5_A = hglm2(posXsymp~
>>>
>>> treatment+proper_mask_base+prop_resp_ill_base_2
>>>
>>> + pairID + (1 | union), family =poisson (link = log), rand.family
>>> =Gamma(link=log),
>>> data = bdata.raw3)#HGLM package using hglm2
>>>
>>> Does, for example, the choice of distribution for the fixed effects
>>>
>>> portion
>>>
>>> of the model inform the choice for the random effects?
>>>
>>> Thank you for any insights.
>>>
>>> Best regards,
>>> J.D.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org
>>> >
>>>
>>> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>> --
>>> Dr. Benjamin Bolker
>>> Professor, Mathematics & Statistics and Biology, McMaster University
>>> Director, School of Computational Science and Engineering
>>> (Acting) Graduate chair, Mathematics & Statistics
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> --
>>> Dr. Benjamin Bolker
>>> Professor, Mathematics & Statistics and Biology, McMaster University
>>> Director, School of Computational Science and Engineering
>>> (Acting) Graduate chair, Mathematics & Statistics
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list