[R-sig-ME] Back-transformation of Poisson model

Tue Mar 15 19:27:18 CET 2016

Thanks Jarrod,

You’re right that the exp(Xb + 0.5v) correction for Jensen’s inequality is better in a model with more levels of the random effect. For 10,000 groups and 100 observations per group, it has about 10% error. I couldn’t do a ton of replicates because it’s slow. I rewrote the simple example (below) in case other people want to play around with it. 

I think the answer to Q2 may be that predict.merMod does not marginalize over random effects to get predictions on the response scale. A lot of people don’t realize this, so I might add an issue about documentation to the lme4 github page, unless the authors chime in.

cheers,
Mollie

library(lme4)

# simulate Poisson data with random intercept
ng=100 #number of groups
dat=data.frame(group=paste0('g', rep(1:ng, each=100)))
x=rnorm(ng, mean=5, sd=2) 
dat$y = rpois(nrow(dat), exp(x[dat$group]))
mean(dat$y)

# fit model
m1=glmer(y ~ (1 | group), family = poisson, dat)
#dat$obs=1:nrow(dat)
#m2=glmer(y ~ (1 | group) + (1|obs), family = poisson, dat)

#estract estimates
summary(m1)
exp(fixef(m1)+ 0.5*VarCorr(m1)$group[1])#from MCMCglmm Coursenotes p.46

nd=data.frame(group=paste0('g', -5:-1)) #add new levels
str(predict(m1, newdata=nd, type="response", re.form=NULL, allow.new.levels=TRUE))
str(predict(m1, newdata=nd, type="response", re.form=NA, allow.new.levels=TRUE))
str(predict(m1, newdata=nd, type="response", re.form=NA))
str(predict(m1, type="response", re.form=NA))

mean(exp(coef(m1)[[1]][,1])) 
------------------------
Mollie Brooks, PhD
Postdoctoral Researcher, Population Ecology Research Group
Department of Evolutionary Biology & Environmental Studies, University of Zürich
http://www.popecol.org/team/mollie-brooks/

> On 14Mar 2016, at 21:04, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
> 
> Hi Mollie,
> 
> In simple models it is the sum of the variance components, although with things such as random regression it gets a bit more complicated.  For log-link Poisson models the mean is simply the mean of a log-normal. For logit models there is not an anlytical solution, and the approximation in the CourseNotes I certainly got from somewhere but can't now remember where from: somebody else has pointed out that my McCulloch & Searle citation does not have the relevant material so I will try and hunt down where the result comes from.
> 
> I think the deviation of your predicted and actual mean arises because of sampling error in the intercept/variance. The mean of an exponentiated sum of two normals will be larger than the exponentiated sum of the means of two normals. Consequently an unbiased estimator of the intercept and variance would lead to an upwardly biased estimator of the data-scale mean. Not sure about this though! Perhaps see if the predicted and actual mean become closer if you up the number of random effect levels (and so the intercept/variance become more precisely estimated).
> 
> Regarding Q2 I'm also not sure. In predict.MCMCglmm you can specifiy the random effects you want to marginalise over, but I *think* this is not implemented in lme4?
> 
> Cheers,
> 
> Jarrod
> 
> 
> 
> 
> 
> 
> 
> On 14/03/2016 16:34, Mollie Brooks wrote:
>> Hi R-sig-mixed,
>> I found the below discussion of back-transformations from the log scale and the same formula is on page 46 of the MCMCglmm course notes pdf (version June 20, 2015). I can’t find a copy of the cited book McCulloch CE, Searle SR (2001) or any other source for this information. I have one point that I’m still wondering about…should v, the sum of the variance components, include residual error in any model, i.e. not just residual error that is part of an observation-level random effect for over-dispersed models? In the following example, the formula, exp(Xb+0.5*v) seems to be giving me an estimate that is consistently higher than my population mean. So adding the residual variance will make it even worse. Is this normal?
>> Now for the R-specific part of the question... is there a version of predict.merMod that gives the the predictions on the response scale averaged over possible realizations of the random effects, i.e. marginalized with respect to the random effects? I can’t get any predictions close to the observed mean in the following example. I expected that a new level of the random effect would be given a marginalized prediction. Maybe this could be added to the documentation.
>> thanks,
>> Mollie
>> 
>> library(lme4)
>> 
>> # simulate Poisson data with random intercept
>> set.seed(11)
>> dat=data.frame(group=paste0('g', rep(1:10, each=100)))
>> x=rnorm(10, mean=5,sd=2)
>> dat$y = rpois(nrow(dat), exp(x[dat$group]))
>> mean(dat$y)#observed mean
>> 
>> # fit model
>> m1=glmer(y ~ (1 | group), family = poisson, dat)
>> 
>> #estract estimates
>> str(predict(m1,  type="response", re.form=NA))
>> exp(fixef(m1)+ 0.5*VarCorr(m1)$group[1])#from MCMCglmm Coursenotes p.46
>> #not very close to the mean
>> 
>> nd=data.frame(group=paste0('g', -5:-1)) #add new levels
>> str(predict(m1, newdata=nd, type="response", re.form=NULL, allow.new.levels=TRUE))
>> str(predict(m1, newdata=nd, type="response", re.form=NA, allow.new.levels=TRUE))
>> str(predict(m1, newdata=nd, type="response", re.form=NA))
>> str(predict(m1, type="response", re.form=NA))
>> mean(exp(coef(m1)[[1]][,1]))#only number close to the observed mean
>> 
>>> ----- Forwarded message from j.hadfield at ed.ac.uk <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> -----
>>>      Date: Tue, 12 Jan 2010 18:38:06 +0000
>>>      From: Jarrod Hadfield <j.hadfield at ed.ac.uk <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
>>> Reply-To: Jarrod Hadfield <j.hadfield at ed.ac.uk <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
>>>   Subject: Re: [R-sig-ME] Back-transformation of Poisson model
>>>        To: "Kardynal,Kevin [Yel]" <Kevin.Kardynal at EC.gc.ca <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
>>> 
>>> Dear Kevin,
>>> 
>>> You will have to exponentiate the output from predict.
>>> 
>>> It is worth noting there are three types of prediction (on the
>>> data-scale) you could make:
>>> 
>>> a) exp(Xb)
>>> b) exp(Xb+Zu)
>>> c) int exp(Xb)du assuming the u's are iid.
>>> 
>>> X - fixed effect design matrix  b - fixed effects
>>> Z - random effect design matrix  u - fixed effects
>>> 
>>> You have done the first, but this can be very different from c, which
>>> is what I expect you want: the predictions averaged over possible
>>> realisations of the random effects.
>>> 
>>> To obtain c use exp(Xb+0.5*v)
>>> 
>>> where v is the sum of the variance components. If you use additive
>>> over-dispersed models you want to include the "residual" variance in
>>> v, also.
>>> 
>>> Cheers,
>>> 
>>> Jarrod
>>> 
>>> Quoting "Kardynal,Kevin [Yel]" <Kevin.Kardynal at EC.gc.ca <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>:
>>> 
>>>> Hello,
>>>> 
>>>> I'd like to know if predicted values from  a Poisson mixed model require
>>>> back-transformation to get the 'real' predicted values or if this is
>>>> done automatically? I assume that the predicted values are already
>>>> back-transformed since my results are similar to the annual means.
>>>> 
>>>> My model for estimating bird trends in lme4 is as such:
>>>> 
>>>> lme3<-lmer(Abundance ~ Year +  (1|Observer) + (1|Site/Station),
>>>> data=SWTH, family = poisson(link=log))
>>>> 
>>>> I then took the coefficients from the mixed model and used them in a GLM
>>>> to derive the predicted values.
>>>> dd <-fixef(lme3) # gives coefficients
>>>> 
>>>> lme_glmDEY <- glm(Abundance ~ Year, data=SWTH, family = poisson(link =
>>>> log)) # for poisson distribution
>>>> lme_glmDEY$coefficients <- dd
>>>> 
>>>> # create a data set of predicted values
>>>> pred <- as.data.frame(predict(lme_glmDEY, SWTH , "response", se.fit=T))
>>>> pred_val <- as.data.frame(cbind(SWTH,pred))
>>>> 
>>>> So, do I have to perform a back-transformation on the predicted values
>>>> (ie., using exp())?
>>>> 
>>>> Thanks,
>>>> 
>>>> Kevin
>>>> 
>>>> 	[[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>>> 
>>> 
>>> ----- End forwarded message -----
>>> 
>>> 
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>> ------------------------
>> Mollie Brooks, PhD
>> Postdoctoral Researcher, Population Ecology Research Group
>> Department of Evolutionary Biology & Environmental Studies, University of Zürich
>> http://www.popecol.org/team/mollie-brooks/ <http://www.popecol.org/team/mollie-brooks/>
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]