[R-sig-ME] Very large HPDintervals, mcmcsamp, lmer

Wed Dec 7 21:12:50 CET 2011

Thierry,

I have two quick follow up questions for my model that I hope you can
can answer for me.
model<-lmer(TN ~ wsh*rip + (1|stream), data=all24)

First, the one I should know: how did you determine the number of
parameters? Watershed (4) + reach (2) +reach x watershed (1) = 7?

And how many parameters are in this model? 9?
model<-glmer(Y ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
data=ept, family=binomial(link="logit"))

Second: is there a simple explanation for why 10 observations per
parameter is the rule of thumb?

Thank you

Colin

On Tue, Nov 22, 2011 at 12:59 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear Colin,
>
> Your sample size is too small for you model. You model has 7 fixed and 1 random parameter. You could use 10 observations per parameters as a rule of thumb. Hence your model would require about 80 obversations.
>
> This leaves you with two options: a) collect more data b) reduce the model.
>
> Best regards,
>
> Thierry
>
>> -----Oorspronkelijk bericht-----
>> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
>> bounces at r-project.org] Namens Colin Wahl
>> Verzonden: maandag 21 november 2011 23:54
>> Aan: r-sig-mixed-models at r-project.org
>> Onderwerp: [R-sig-ME] Very large HPDintervals, mcmcsamp, lmer
>>
>> Hello all,
>> I am using lmer to describe patterns in stream variables between streams
>> draining watersheds with different land uses. I need to make some kind of
>> determination of significance. I understand that mixed models cannot provide
>> accurate p values because the denominator df is unknown. As per D. Bates
>> recommendation in his post "lmer, p-values and all that" I used mcmcsamp and
>> HPDinterval to get 95% "confidence intervals." These intervals were very large
>> and suggest that there are no significant differences in variables where
>> differences seem obvious. I'll use one of my variables as an example: total
>> stream nitrogen. There are 12 streams and 24 sites (two riparian types per
>> stream).
>>
>> model<-lmer(TN ~ wsh*rip + (1|stream), data=all24)
>>    Data: all24
>>    AIC   BIC logLik deviance REMLdev
>>  78.47 90.25 -29.24    65.15   58.47
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  stream   (Intercept) 4.10443  2.0259
>>  Residual             0.20885  0.4570
>> Number of obs: 24, groups: stream, 12
>>
>> Fixed effects:
>>                  Estimate Std. Error t value
>> (Intercept)  6.04887    1.19903   5.045
>> wshD        -4.65847    1.69569  -2.747
>> wshF        -4.85194    1.58617  -3.059
>> wshG        -5.18847    1.89584  -2.737
>> ripN            -0.18540    0.37314  -0.497
>> wshD:ripN    0.15800    0.52770   0.299
>> wshF:ripN    0.09825    0.49362   0.199
>> wshG:ripN    0.00915    0.58998   0.016
>>
>> Estimates are accurate; intercept(cultivated) watersheds clearly have a much
>> higher nitrogen concentration than the other watersheds. I use mcmcsamp and
>> HPD interval to get 95% CIs:
>>
>> model.mcmc<-mcmcsamp(model, n=1000)
>> model.HPDi<-HPDinterval(model.mcmc, prob=0.95)
>>
>>                       Estimate          Std. Error            t
>>    lower                upper
>> Cultivated       6.04886663   1.19903192   5.04479200   3.75759240   8.40959042
>> Developed        1.39040003   1.69568720  -2.74724406  -4.27036665
>> 6.80232811
>> Forested         1.19692503   1.58617014  -3.05890364  -4.53524163   6.52712926
>> Grassland        0.86039999   1.89583593  -2.73676987  -4.96308190   6.93964588
>> Riparia(Cult)    5.86346654   0.37313911  -0.49686587   0.52284572
>> 11.44574598
>> Riparia(Devel)   1.36300000   0.52769839   0.29941356 -12.15807916
>> 14.22417096
>> Riparia(Fores)   1.10977501   0.49361665   0.19904123 -11.93267982
>> 13.77391850
>> Riparia(Grass)   0.68414998   0.58998474   0.01550901 -13.67422625
>> 14.50577622
>>
>> Considering that the CI are substantially overlapped, I interpret these CIs to
>> suggest there is no detectable differences between cultivated streams and other
>> streams. This cant be right.
>>
>> I have a very elementary understanding of Bayesian statistics, and have not used
>> mcmc or HPDinterval before. Is it apparent that I've done something wrong? Is
>> my sample size too small? I hope this isnt just a problem with calculating CIs
>> from the output, which I'm pretty sure is done the same way its done with lmer
>> (see below).
>>
>> Please help; I'm desperate to wrap up this analysis.
>>
>> Thank you,
>> Colin Wahl
>> Graduate Student
>> Western Wash. University
>> Bellingham, WA USA
>>
>>
>>
>> CI recalculation:
>>
>> I would be confident that CIs for each fixed effect are relative to the intercept,
>> but the CIs look wrong so I am questioning myself. Is it true that the
>> HPDintervals for each fixed effect are relative to the intercept coefficient, just
>> as they are in the initial lmer output?
>>
>> i.e...
>>                 lower     upper
>> (Intercept)  3.684270  8.626003
>> wshD        -8.256167 -1.259411
>> wshF        -8.469073 -1.852237
>> wshG        -9.338573 -1.498553
>> ripN        -3.309220  3.105756
>> wshD:ripN   -4.412022  4.944243
>> wshF:ripN   -4.452947  4.422371
>> wshG:ripN   -5.510786  4.917613
>>
>> becomes:
>>
>>                   lower     upper
>> (Intercept)   3.6842697  8.626003
>> wshD         -4.5718974  7.366592
>> wshF         -4.7848029  6.773766
>> wshG         -5.6543034  7.127450
>> ripN          0.3750498 11.731759
>> ripD        -12.2931391 15.416590
>> ripF        -12.5469694 14.301893
>> ripG        -14.4743090 15.150819
>>
>> Using the following code:
>> ci<-as.matrix(model.HPDi$fixef)
>> C=ci[1,]
>> D=C+ci[2,]
>> FF=C+ci[3,]
>> G=C+ci[4,]
>> CN=C+ci[5,]
>> DN=(D)+(ci[5,]+ci[6,])
>> FN=(FF)+(ci[5,]+ci[7,])
>> GN=(G)+(ci[5,]+ci[8,])
>> confint<-c(C,D,FF,G,CN,DN,FN,GN)
>>
>>
>> --
>> CW
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
CW