[R-sig-ME] glmer Z-test with individual random effects

Jarrod Hadfield j.hadfield at ed.ac.uk
Sat Nov 13 12:23:08 CET 2010


Hi,

I think this is a hard question to answer for complex models  
(otherwise MCMC would be redundant) but in simple situations the  
inverse-Wishart prior can be thought of as adding nu effects with  
known squared deviation V.

For example, we could have 5 levels of some factor (rfac) and observe  
10 data with pairs of observations associated with each level of rfac:

rfac<-gl(5,2)
e.rfac<-rnorm(5, 0, sqrt(0.1))
y<-rnorm(10)+e.rfac[rfac]
dat<-data.frame(y=y, rfac=rfac)

Here the intercept is 0.0, the residual variance is 1.0 and the  
variance of effects associated with rfac is 0.1. I'll call this last  
variance Vg.  We can fix the posterior for all parameters at their  
true values except Vg which has an IW prior with V=1 and nu=1:

prior1=list(B=list(mu=0, V=1e-6), R=list(V=1, fix=1),  
G=list(G1=list(V=1, nu=1)))

m1<-MCMCglmm(y~1, random=~rfac, data=dat, prior=prior1)

We could also observe an additional 1000 data points associated with a  
6th level of rfac which has an effect of 1.0:

  dat2<-data.frame(y=c(y,rnorm(1000)+1), rfac=c(rfac,rep(6, 1000)))

and analyse as before but with a flat prior (i.e. nu=0):

prior2=list(B=list(mu=0, V=1e-6), R=list(V=1, fix=1),  
G=list(G1=list(V=1, nu=0)))

m2<-MCMCglmm(y~1, random=~rfac, data=dat, prior=prior2)

The posterior distribution for Vg should be (almost) the same in both  
cases

plot(density(m1$VCV[,1]))
lines(density(m2$VCV[,1]), col="red")

With multi-parameter models, or quantities that are functions of two  
or more parameters, then this interpretation can be misleading.  
However, in many instances the idea does seem to serve as a good  
heuristic. The danger of course is thinking how can adding 1 prior  
effect (i.e. nu =1) make that much difference (compared to nu=0, say).  
It can when a) factor levels are poorly replicated b) replication  
within factor levels is poor and c) when prior V conflicts with the  
actual value of Vg.

Cheers,

Jarrod






On 13 Nov 2010, at 00:42, John Maindonald wrote:

> HI Jarrod -
> The mechanics of what is happening are not my issue.  What bothers
> me is that I do not have any intuitive sense of how the choice of  
> prior
> may be constraining the variance estimates.  Is it for example  
> possible
> to say that the effect is comparable to adding  some number of degrees
> of freedom to the variance estimate?  What is the balance between
> what comes from the prior, and what derives from the data?
> John.
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> On 12/11/2010, at 10:51 PM, Jarrod Hadfield wrote:
>
>> Hi John,
>>
>> I'm not sure which part of the MCMCglmm routine is unclear but in a  
>> nut shell:
>>
>> The data are distributed according to the specified distribution  
>> (e.g. Poisson ) with the dsitribution parameter equal to the linear  
>> predictor (nu) on the inverse link scale:
>>
>> y ~ Pois(exp(nu))
>>
>> A linear (mixed) model is proposed for the linear predictor:
>>
>> nu ~ N(Xb+Zu, I*Ve)
>>
>> In all models a "units" term is included which is the same as an  
>> observational-level random effect.  This is the residual term in  
>> the linear part of the model.
>>
>> Priors are passed as  a list of the form:
>>
>> prior=list(B=list(),
>>              R=list(),
>>              G=list(G1=list(),
>>                          G2=list()))
>>
>>
>> where B is the prior for the fixed effects, R is the prior for Ve  
>> (or possibly a residual covariance matrix depending on the model)  
>> and the elements of G are the priors for each of the random effect  
>> (co)variances.
>>
>>
>> The prior for b is multivariate normal Pr(b) ~ N(B$mu, B$V)
>>
>> The prior for Ve is inverse-Wishart Pr(Ve) ~ IW(R$nu, R$V*R$nu)  
>> where the first term is the degree of belief and R$V*R$nu is the  
>> scale matrix
>>
>> The prior for the random effects variances (Vg) are set in the same  
>> way as Ve, although its possible to also fit parameter expanded  
>> priors.
>>
>> When the variance term is scalar (rather than a covariance matrix)  
>> then the inverse-Wishart and inverse-gamma are the same and IW(R 
>> $nu, R$V*R$nu) = IG(R$nu/2, R$V*R$nu/2) where the two parameters  
>> are the shape and scale. Your specification R=list(V=1, nu=0.002)   
>> is therefore  IG(0.001, 0.001) which used to be used a lot (and  
>> probably still is) as a "weak" prior by users of WinBUGS.  The REML  
>> estimator for Ve should be equal to the posterior mode in MCMCglmm  
>> when the improper prior R=list(V=0, nu=-2)  is used. 0 will not be  
>> accepted by MCMCglmm so you would have to use something small (e.g.  
>> 1e-6). The prior is improper so you have to be careful when using  
>> it, and although the point estimate may have some useful properties  
>> (the same as the REML estimate) the full posterior distribution may  
>> not be ideal with small sample sizes.
>>
>> The sampling scheme is:
>>
>> update nu using MH updates or slice sampling
>> Gibbs sample b and u together
>> Gibbs sample (co)variance components sequentially.
>>
>> For other models  such as ordinal regression or simultaneous/ 
>> recursive models there are other schemes also, but these are quite  
>> specialised.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 11 Nov 2010, at 23:45, John Maindonald wrote:
>>
>>> The Wald tests (as represented in glm and glmer output of z- 
>>> statistics
>>> and p-values) are approximate for both glm models and glmm models
>>> with non-identity links. The approximation can fail badly if the  
>>> link is
>>> highly non-linear over a region of the response that is relevant  
>>> for a
>>> parameter of interest.  The Hauck-Donner phenomenon, where the
>>> z-statistic decreases as the effect estimate increases, is an  
>>> extreme
>>> example.
>>> (This happens, e.g., as one of the levels being compared gives a
>>> fitted value that moves close to a binomial proportion of 0 or 1, or
>>> close to a Poisson estimate of 0.)
>>>
>>> The additional complication for a glmm is that the SE may have
>>> two components -- e.g., a poisson or binomial error, and a random
>>> normal error that is added on the scale of the linear predictor.   
>>> This
>>> random normal error somewhat alleviates variance change effects
>>> (including Hauck-Donner) that result from non-linearity in the link,
>>> while adding uncertainty to the SE estimate.  If the contribution  
>>> from
>>> the glm family error term is largish relative to contribution from  
>>> the
>>> random normal error (> 3 or 4  times as large?), treating the z- 
>>> statistic
>>> as normal may not in many circumstances be too unreasonable,
>>> even if the relevant degrees of freedom are as small as maybe 4
>>> (e.g., where the test is for consistency across 5 locations).
>>>
>>> On data where I seem to have this situation, fairly consistently
>>> across a number of responses, I initially used MCMCglmm().
>>> I was concerned about the contribution from the random normal
>>> error (including a contribution from an observation level random
>>> effect term).  I found I hat I had to choose a somewhat informative
>>> prior (inverse Wishart with V=1, nu=0.002) to consistently get
>>> convergence.  With this prior, the MCMCglmm credible intervals
>>> were remarkably close to glmer confidence intervals, treating the
>>> z-statistics as normal.
>>>
>>> No doubt the effect of the chosen prior is to insist that the true
>>> random normal error variance is fairly close to the variance as
>>> estimated by glmer().  A frustration (for me, at least) with using
>>> MCMCglmm is that I do not know just what MCMCglmm() is doing
>>> in this respect, short of doing some careful investigative  
>>> exploration
>>> of the MCMC simulation results (which is an after-the-event check,
>>> where I'd like to know before the event).  Comments, Jarrod?
>>>
>>> All this is to emphasise that we are not in the arena, unless the
>>> relevant degrees of freedom are large, of precise science.
>>>
>>> John Maindonald             email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>> http://www.maths.anu.edu.au/~johnm
>>>
>>> On 12/11/2010, at 2:18 AM, Ben Bolker wrote:
>>>
>>>> On 11/11/2010 09:58 AM, Jens Åström wrote:
>>>>> Dear list,
>>>>>
>>>>> As I have read (Bolker et al. 2009 TREE), the Wald Z test is only
>>>>> appropriate for GLMMs in cases without overdispersion.
>>>>>
>>>>> Assuming we use family=poisson with lmer and tackle  
>>>>> overdispersion by
>>>>> incorporating an individual random effect AND this adequately  
>>>>> "reduces"
>>>>> the overdispersion, is it then OK to use the Wald z test as  
>>>>> reported by
>>>>> lmer?
>>>>>
>>>>> In other words, are the p-values reported by lmer in those cases
>>>>> useful/"correct"? Or do they suffer from the usual problems with
>>>>> figuring out the number of parameters used by the random effects?
>>>>
>>>> They are equivalent to assuming an infinite/large 'denominator  
>>>> degrees
>>>> of freedom'.  If you have a large sample size (both a large  
>>>> number of
>>>> total samples relative to the number of parameters, and a large  
>>>> number
>>>> of random-effects levels/blocks) then this should be reasonable  
>>>> -- if
>>>> not, then yes, the 'usual problems with figuring out the number of
>>>> parameters' is relevant.  On the other hand, if you're willing to  
>>>> assume
>>>> that the sample size is large, then likelihood ratio rests
>>>> (anova(model1,model2)) are probably better than the Wald tests  
>>>> anyway.
>>>>
>>>>>
>>>>> Secondly, is it good practice to judge lmer's capability of  
>>>>> "reducing"
>>>>> the overdispersion by summing the squared residuals (pearson) and
>>>>> compare this to a chi square distribution (with N-1 degrees of  
>>>>> freedom)?
>>>>
>>>> I would say this is reasonable, although again it's a rough guide
>>>> because the true degrees of freedom are a bit fuzzy -- it should
>>>> probably be at most N-(fixed effect degrees of freedom)?
>>>>
>>>> Would be happy to hear any conflicting opinions.
>>>>
>>>> Ben Bolker
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the R-sig-mixed-models mailing list