[R-sig-ME] MCMCglmm with exponential family

Sun Sep 8 01:39:20 CEST 2013

Iain Stott <i.stott at ...> writes:

> 
> Hi R-users,
> 
> I'm having some problems interpreting mixed effects models using the
> exponential family. Struggling to find any info on such models on the
> interwebs.
> 
> Some background: I'm working with soil carbon data and need robust
> estimates of concave-up, decreasing depth curves, and confidence intervals
> on them, which MCMCglmm should be able to provide. The usual way to fit
> such curves is with a (negative) exponential model, which again is
> supported by MCMCglmm (my understanding is that the exponential and
> negative exponential distibutions are the same 
> in everything but name?). My
> data is pretty much exponentially-distributed (more precisely
> gamma-distributed with a very strong right skew), therefore is real with
> range 0<x<Inf.

  Although MCMCglmm does have an "exponential" family, you should be
careful in terminology -- I got a bit confused becase "exponential
family" is also the term for the broad class of functions that
GLM can handle (e.g. Normal, Poisson, Binomial, and Gamma -- of
which Exponential is a special case).

  To the extent that you want exponential relationships between
predictors and responses, you should use a log link (which makes the
inverse-link function exponential, making the mean responses
exponential functions of linear combinations of the predictors).  To
the extent that you want continuous, positive, skewed distributions of
the responses, you should _either_ use a Gamma distribution (which is
part of the exponential family), or transform accordingly.
(An exponential is a special case of the Gamma, so that should
work OK ... although you might want some additional flexibility
with your actual data ...)

  I often log-transform and fit a linear mixed model (i.e. assume
the data are conditionally log-Normal) in this situation -- happily,
log-transforming both normalizes the responses and linearizes the
relationships; Gamma and log-Normal curves have very similar shapes.

I checked the "CourseNotes" vignette -- on p. 123 suggests that the
"exponential" family uses a *negative-log* link, rather than
an inverse link (!!), i.e. y = exp(-eta) where eta is the
linear predictor (X beta).

> The models: I'm fitting models with one continuous variable (depth), and
> two categorical variables (soil type / land cover). I've included a single
> random variable (site), for which I'm fitting either just an intercept or a
> fully-(co)varying intercept and slope combination using us(). The fixed
> effects take default priors of N(0, 10e8) and random effects and residuals
> take uninformative proper priors with V=1, nu=0.002. Below, I've pasted
> code which will replicate my data and model structure: the range of
> solutions (intercepts, slopes) I'm getting out of these dummy models are in
> accordance with what's being computed for my data.
> 
> The results: Models converge well (albeit after a large number of
> iterations), but the estimates I'm getting out of them make no sense to me
> at all. I must be missing something somewhere! Transformed curves do not
> fit the raw data, and transformed data do not fit the fitted lines. I'm
> using the canonical inverse link for the exponential distribution,
> Xbeta=-mu^-1 (and various variations on that theme, to no avail...)
> 
> I can think of three possible reasons I can't make sense of this:
> 
> 1. My understanding of the exponential distribution is woefully poor: that
> it is not equivalent to the negative exponential distribution (in which
> case Wikipedia would be lying to me) or that I am missing a crucial piece
> of theory somewhere along the line.
> 
> 2. Default fixed priors of N(0, 10e8) are completely inadequate for
> exponential data
> 
> 3. I've misinterpreted the link function in some way: it's not inverse, the
> inverse of inverse is for some reason not inverse (!) or I'm using the
> wrong link altogether.
> 
> I can't find anything to contradict my thoughts on these points. So, if
> someone with a better understanding of such things could offer me some
> words of wisdom, point out to me where I'm making stupid mistakes, it would
> be hugely appreciated!
> 
> Thanks
> 
> Iain
> 
> CODE:
> 
> MODELS:
>

library("MCMCglmm")
set.seed(101)
resp<-rgamma(406,scale=0.5,shape=1.4)*75
## hist(resp,breaks=100)
fix1<-rep(c("A","B","C","D","E","F","G"),58);
fix1<-fix1[order(rnorm(406))]; fix1<-as.factor(fix1)
fix2<-rep(c("ONE","TWO"),203); fix2<-fix2[order(rnorm(406))];
fix2<-as.factor(fix2)
fix3<-runif(406)*100

ran1<-rep(1:58,7); ran1<-as.factor(ran1)

data<-data.frame(resp,fix1,fix2,fix3,ran1)

nitt<-100000
burn<-nitt*0.1
thin<-nitt*0.005

prior1.0<-list(R = list(V=1, nu=0.002),
               G = list(G1=list(V=1, nu=0.002)))

prior1.1<-list(R = list(V=1, nu=0.002),
               G = list(G1=list(V=diag(1,2), nu=0.002)))

m1.0<-MCMCglmm(resp~fix1+fix2+fix3+fix1:fix3+fix2:fix3,
               random=~ran1, prior=prior1.0, data=data,
                family="exponential", verbose=F,
                nitt=nitt, burnin=burn, thin=thin)

m1.1<-MCMCglmm(resp~fix1+fix2+fix3+fix1:fix3+fix2:fix3,
                random=~us(1+fix3):ran1, prior=prior1.1, data=data,
family="exponential", verbose=F,
                nitt=nitt, burnin=burn, thin=thin)

> 
> EXAMPLE FIT:
> 
sols<-summary(m1.0)$solutions

x<-1:100
y<-sols["(Intercept)",1]+x*sols["fix3",1]
plot(resp~fix3); lines(x,exp(-y),col=2,lwd=3)
head(cbind(x,y,exp(-y)))

respslice <- resp[fix3>20 & fix3<30]
plot(density(respslice,from=0))
rug(respslice)
y0 <- sols["(Intercept)",1]+25*sols["fix3",1]
xvec <- 0:250
lines(xvec,dexp(xvec,rate=1/exp(-y0)),col=2)