[R-sig-ME] Very large HPDintervals, mcmcsamp, lmer

Mon Nov 21 23:54:08 CET 2011

Hello all,
I am using lmer to describe patterns in stream variables between
streams draining watersheds with different land uses. I need to make
some kind of determination of significance. I understand that mixed
models cannot provide accurate p values because the denominator df is
unknown. As per D. Bates recommendation in his post "lmer, p-values
and all that" I used mcmcsamp and HPDinterval to get 95% "confidence
intervals." These intervals were very large and suggest that there are
no significant differences in variables where differences seem
obvious. I'll use one of my variables as an example: total stream
nitrogen. There are 12 streams and 24 sites (two riparian types per
stream).

model<-lmer(TN ~ wsh*rip + (1|stream), data=all24)
   Data: all24
   AIC   BIC logLik deviance REMLdev
 78.47 90.25 -29.24    65.15   58.47
Random effects:
 Groups   Name        Variance Std.Dev.
 stream   (Intercept) 4.10443  2.0259
 Residual             0.20885  0.4570
Number of obs: 24, groups: stream, 12

Fixed effects:
                 Estimate Std. Error t value
(Intercept)  6.04887    1.19903   5.045
wshD        -4.65847    1.69569  -2.747
wshF        -4.85194    1.58617  -3.059
wshG        -5.18847    1.89584  -2.737
ripN            -0.18540    0.37314  -0.497
wshD:ripN    0.15800    0.52770   0.299
wshF:ripN    0.09825    0.49362   0.199
wshG:ripN    0.00915    0.58998   0.016

Estimates are accurate; intercept(cultivated) watersheds clearly have
a much higher nitrogen concentration than the other watersheds. I use
mcmcsamp and HPD interval to get 95% CIs:

model.mcmc<-mcmcsamp(model, n=1000)
model.HPDi<-HPDinterval(model.mcmc, prob=0.95)

                      Estimate          Std. Error            t
   lower                upper
Cultivated       6.04886663   1.19903192   5.04479200   3.75759240   8.40959042
Developed        1.39040003   1.69568720  -2.74724406  -4.27036665   6.80232811
Forested         1.19692503   1.58617014  -3.05890364  -4.53524163   6.52712926
Grassland        0.86039999   1.89583593  -2.73676987  -4.96308190   6.93964588
Riparia(Cult)    5.86346654   0.37313911  -0.49686587   0.52284572  11.44574598
Riparia(Devel)   1.36300000   0.52769839   0.29941356 -12.15807916  14.22417096
Riparia(Fores)   1.10977501   0.49361665   0.19904123 -11.93267982  13.77391850
Riparia(Grass)   0.68414998   0.58998474   0.01550901 -13.67422625  14.50577622

Considering that the CI are substantially overlapped, I interpret
these CIs to suggest there is no detectable differences between
cultivated streams and other streams. This cant be right.

I have a very elementary understanding of Bayesian statistics, and
have not used mcmc or HPDinterval before. Is it apparent that I've
done something wrong? Is my sample size too small? I hope this isnt
just a problem with calculating CIs from the output, which I'm pretty
sure is done the same way its done with lmer (see below).

Please help; I'm desperate to wrap up this analysis.

Thank you,
Colin Wahl
Graduate Student
Western Wash. University
Bellingham, WA USA

CI recalculation:

I would be confident that CIs for each fixed effect are relative to
the intercept, but the CIs look wrong so I am questioning myself. Is
it true that the HPDintervals for each fixed effect are relative to
the intercept coefficient, just as they are in the initial lmer
output?

i.e...
                lower     upper
(Intercept)  3.684270  8.626003
wshD        -8.256167 -1.259411
wshF        -8.469073 -1.852237
wshG        -9.338573 -1.498553
ripN        -3.309220  3.105756
wshD:ripN   -4.412022  4.944243
wshF:ripN   -4.452947  4.422371
wshG:ripN   -5.510786  4.917613

becomes:

                  lower     upper
(Intercept)   3.6842697  8.626003
wshD         -4.5718974  7.366592
wshF         -4.7848029  6.773766
wshG         -5.6543034  7.127450
ripN          0.3750498 11.731759
ripD        -12.2931391 15.416590
ripF        -12.5469694 14.301893
ripG        -14.4743090 15.150819

Using the following code:
ci<-as.matrix(model.HPDi$fixef)
C=ci[1,]
D=C+ci[2,]
FF=C+ci[3,]
G=C+ci[4,]
CN=C+ci[5,]
DN=(D)+(ci[5,]+ci[6,])
FN=(FF)+(ci[5,]+ci[7,])
GN=(G)+(ci[5,]+ci[8,])
confint<-c(C,D,FF,G,CN,DN,FN,GN)

-- 
CW