[R-sig-ME] MCMCglmm with exponential family

Joshua Wiley jwiley.psych at gmail.com
Tue Sep 10 00:51:58 CEST 2013


Hi Iain,

I wrote a package to get predictions from MCMCglmm---importantly the
predictions are based on the full posterior, which aside from making
them slow and computationally intensive, means that you can
marginalize the predictions (or transform, or get credible intervals,
or, ...) without having to work out the analytical solution.

The downside is that I have not implemented it for the exponential
family, but the code base may be a place for you to start.  And if you
want, I may have some time to work with you and could incorporate
support for exponential if you can help provide some simple examples,
and help check the accuracy of results.

The package is here on github: https://github.com/jwiley/postMCMCglmm

Cheers,

Josh


On Mon, Sep 9, 2013 at 12:35 PM, Iain Stott <i.stott at exeter.ac.uk> wrote:
> An update: I did work out the issue shortly after posting the email, and
> confirmed my mistake with Jarrod Hadfield, who was very helpful! A scan of
> his J. Stat. Soft. paper showed me I was using the wrong link: for
> exponential family, the latent variables (l) are mapped onto the data (y)
> using y=exp(-l).
>
> David Atkins' point (below: thanks for your help!) is also helpful: for
> non-identity links there is a difference between conditional and
> marginalised posteriors, and to get the expected marginal value rather than
> the conditional mode it's necessary to average over the random effects.
> This is something I wasn't aware of! The predict method for MCMCglmm
> doesn't support the exponential family yet, but by hacking the code and
> augmenting with info from [
> http://depts.washington.edu/cshrb/wordpress/wp-content/uploads/2013/04/Tutorials-Tutorial-on-Count-Regression-R-code.txt]
> it should be possible... except that I'm not sure on the analytical
> solutions for marginalising random effects in exponential models. If anyone
> has a solution or suggestion, it'd be very welcome. Failing that I'll
> continue to google, or acquiesce to a less-satisfactory polynomial Gaussian
> model or gamma lmer.
>
> Thanks all
>
> Iain
>
>
>
> On 6 September 2013 21:00, David Atkins <datkins at u.washington.edu> wrote:
>>
>>>
>>> Iain--
>>>
>>> Just writing back to you as a) I'm not sure about this, and b) it isn't
>>> necessarily R specific.
>>>
>>> One of the things that has bitten me a couple times with mixed models
>>> with a non-identity link function is the distinction between conditional
>>> and marginal effects.  With normal errors and identity link function,
>>> predictions from a mixed model essentially average over the random effects.
>>>  With non-identity link functions this is no longer true, and the
>>> predictions based on fixed-effects only are conditional on specific levels
>>> of the random effects.  It is possible to derive marginal predictions,
>>> which do in effect average over the random effects, but this is somewhat
>>> more involved and the predictions include random-effects in them.
>>>
>>> Not sure whether this is what is happening with your data -- since, I'm a
>>> psychologist who mostly analyzes psychiatric / drug abuse data, but I would
>>> not necessarily assume that your predictions based only on the
>>> fixed-effects would give 'good' predictions of your raw data.
>>>
>>> You might check out an article that colleagues and I wrote on mixed
>>> models; in particular, the appendix covers this issue specifically (for
>>> Poisson models):
>>>
>>> http://depts.washington.edu/**cshrb/statistics-resources-**
>>> tutorials-tutorial-on-count-**regression/<http://depts.washington.edu/cshrb/statistics-resources-tutorials-tutorial-on-count-regression/>
>>>
>>> Hope that might help.
>>>
>>> cheers, Dave
>>>
>>>
>>> Hi R-users,
>>>
>>> I'm having some problems interpreting mixed effects models using the
>>> exponential family. Struggling to find any info on such models on the
>>> interwebs.
>>>
>>> Some background: I'm working with soil carbon data and need robust
>>> estimates of concave-up, decreasing depth curves, and confidence intervals
>>> on them, which MCMCglmm should be able to provide. The usual way to fit
>>> such curves is with a (negative) exponential model, which again is
>>> supported by MCMCglmm (my understanding is that the exponential and
>>> negative exponential distibutions are the same in everything but name?).
>>> My
>>> data is pretty much exponentially-distributed (more precisely
>>> gamma-distributed with a very strong right skew), therefore is real with
>>> range 0<x<Inf.
>>>
>>> The models: I'm fitting models with one continuous variable (depth), and
>>> two categorical variables (soil type / land cover). I've included a single
>>> random variable (site), for which I'm fitting either just an intercept or
>>> a
>>> fully-(co)varying intercept and slope combination using us(). The fixed
>>> effects take default priors of N(0, 10e8) and random effects and residuals
>>> take uninformative proper priors with V=1, nu=0.002. Below, I've pasted
>>> code which will replicate my data and model structure: the range of
>>> solutions (intercepts, slopes) I'm getting out of these dummy models are
>>> in
>>> accordance with what's being computed for my data.
>>>
>>> The results: Models converge well (albeit after a large number of
>>> iterations), but the estimates I'm getting out of them make no sense to me
>>> at all. I must be missing something somewhere! Transformed curves do not
>>> fit the raw data, and transformed data do not fit the fitted lines. I'm
>>> using the canonical inverse link for the exponential distribution,
>>> Xbeta=-mu^-1 (and various variations on that theme, to no avail...)
>>>
>>>
>>> I can think of three possible reasons I can't make sense of this:
>>>
>>> 1. My understanding of the exponential distribution is woefully poor: that
>>> it is not equivalent to the negative exponential distribution (in which
>>> case Wikipedia would be lying to me) or that I am missing a crucial piece
>>> of theory somewhere along the line.
>>>
>>> 2. Default fixed priors of N(0, 10e8) are completely inadequate for
>>> exponential data
>>>
>>> 3. I've misinterpreted the link function in some way: it's not inverse,
>>> the
>>> inverse of inverse is for some reason not inverse (!) or I'm using the
>>> wrong link altogether.
>>>
>>>
>>> I can't find anything to contradict my thoughts on these points. So, if
>>> someone with a better understanding of such things could offer me some
>>> words of wisdom, point out to me where I'm making stupid mistakes, it
>>> would
>>> be hugely appreciated!
>>>
>>> Thanks
>>>
>>> Iain
>>>
>>>
>>>
>>> CODE:
>>>
>>>
>>> MODELS:
>>>
>>> resp<-rgamma(406,scale=0.5,**shape=1.4)*75
>>> hist(resp,breaks=100)
>>>
>>> fix1<-rep(c("A","B","C","D","**E","F","G"),58);
>>> fix1<-fix1[order(rnorm(406))]; fix1<-as.factor(fix1)
>>> fix2<-rep(c("ONE","TWO"),203); fix2<-fix2[order(rnorm(406))];
>>> fix2<-as.factor(fix2)
>>> fix3<-runif(406)*100
>>>
>>> ran1<-rep(1:58,7); ran1<-as.factor(ran1)
>>>
>>> data<-data.frame(resp,fix1,**fix2,fix3,ran1)
>>>
>>> nitt<-1000000
>>> burn<-nitt*0.1
>>> thin<-nitt*0.0005
>>>
>>> prior1.0<-list(R = list(V=1, nu=0.002),
>>>                G = list(G1=list(V=1, nu=0.002)))
>>>
>>>
>>> prior1.1<-list(R = list(V=1, nu=0.002),
>>>                G = list(G1=list(V=diag(1,2), nu=0.002)))
>>>
>>> m1.0<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3,
>>>                random=~ran1, prior=prior1.0, data=data,
>>> family="exponential", verbose=F,
>>>                nitt=nitt, burnin=burn, thin=thin)
>>>
>>> m1.1<-MCMCglmm(resp~fix1+fix2+**fix3+fix1:fix3+fix2:fix3,
>>>                random=~us(1+fix3):ran1, prior=prior1.1, data=data,
>>> family="exponential", verbose=F,
>>>                nitt=nitt, burnin=burn, thin=thin)
>>>
>>> EXAMPLE FIT:
>>>
>>> sols<-summary(m1.0)$solutions
>>>
>>> x<-1:100
>>> y<-sols["(Intercept)",1]+x***sols["fix3",1]
>>> plot(resp~fix3); lines(x,-y^-1)
>>> cbind(x,y,-y^-1)
>>>
>>>
>>>
>>>
>>>
>>> - - - - - - - - - - - -
>>> Dr. Iain Stott
>>> Environment and Sustainability Institute
>>> University of Exeter, Penryn Campus
>>> Treliever Road, Penryn,
>>> Cornwall, TR10 9EZ, UK.
>>> http://www.exeter.ac.uk/esi/**people/stott/<http://www.exeter.ac.uk/esi/people/stott/>
>>> - - - - - - - - - - - -
>>> http://www.exeter.ac.uk/esi/
>>> http://biosciences.exeter.ac.**uk/cec/<http://biosciences.exeter.ac.uk/cec/>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> --
>>> Dave Atkins, PhD
>>>
>>> Research Professor
>>> Department of Psychiatry and Behavioral Science
>>> University of Washington
>>> datkins at u.washington.edu
>>> http://depts.washington.edu/**cshrb/david-atkins/#more-48<http://depts.washington.edu/cshrb/david-atkins/#more-48>
>>>
>>> "The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data." -- John Tukey
>>>
>>
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com



More information about the R-sig-mixed-models mailing list