[R-sig-ME] Very large HPDintervals, mcmcsamp, lmer
Colin Wahl
biowahl at gmail.com
Wed Dec 7 21:12:50 CET 2011
Thierry,
I have two quick follow up questions for my model that I hope you can
can answer for me.
model<-lmer(TN ~ wsh*rip + (1|stream), data=all24)
First, the one I should know: how did you determine the number of
parameters? Watershed (4) + reach (2) +reach x watershed (1) = 7?
And how many parameters are in this model? 9?
model<-glmer(Y ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
data=ept, family=binomial(link="logit"))
Second: is there a simple explanation for why 10 observations per
parameter is the rule of thumb?
Thank you
Colin
On Tue, Nov 22, 2011 at 12:59 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear Colin,
>
> Your sample size is too small for you model. You model has 7 fixed and 1 random parameter. You could use 10 observations per parameters as a rule of thumb. Hence your model would require about 80 obversations.
>
> This leaves you with two options: a) collect more data b) reduce the model.
>
> Best regards,
>
> Thierry
>
>> -----Oorspronkelijk bericht-----
>> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
>> bounces at r-project.org] Namens Colin Wahl
>> Verzonden: maandag 21 november 2011 23:54
>> Aan: r-sig-mixed-models at r-project.org
>> Onderwerp: [R-sig-ME] Very large HPDintervals, mcmcsamp, lmer
>>
>> Hello all,
>> I am using lmer to describe patterns in stream variables between streams
>> draining watersheds with different land uses. I need to make some kind of
>> determination of significance. I understand that mixed models cannot provide
>> accurate p values because the denominator df is unknown. As per D. Bates
>> recommendation in his post "lmer, p-values and all that" I used mcmcsamp and
>> HPDinterval to get 95% "confidence intervals." These intervals were very large
>> and suggest that there are no significant differences in variables where
>> differences seem obvious. I'll use one of my variables as an example: total
>> stream nitrogen. There are 12 streams and 24 sites (two riparian types per
>> stream).
>>
>> model<-lmer(TN ~ wsh*rip + (1|stream), data=all24)
>> Data: all24
>> AIC BIC logLik deviance REMLdev
>> 78.47 90.25 -29.24 65.15 58.47
>> Random effects:
>> Groups Name Variance Std.Dev.
>> stream (Intercept) 4.10443 2.0259
>> Residual 0.20885 0.4570
>> Number of obs: 24, groups: stream, 12
>>
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) 6.04887 1.19903 5.045
>> wshD -4.65847 1.69569 -2.747
>> wshF -4.85194 1.58617 -3.059
>> wshG -5.18847 1.89584 -2.737
>> ripN -0.18540 0.37314 -0.497
>> wshD:ripN 0.15800 0.52770 0.299
>> wshF:ripN 0.09825 0.49362 0.199
>> wshG:ripN 0.00915 0.58998 0.016
>>
>> Estimates are accurate; intercept(cultivated) watersheds clearly have a much
>> higher nitrogen concentration than the other watersheds. I use mcmcsamp and
>> HPD interval to get 95% CIs:
>>
>> model.mcmc<-mcmcsamp(model, n=1000)
>> model.HPDi<-HPDinterval(model.mcmc, prob=0.95)
>>
>> Estimate Std. Error t
>> lower upper
>> Cultivated 6.04886663 1.19903192 5.04479200 3.75759240 8.40959042
>> Developed 1.39040003 1.69568720 -2.74724406 -4.27036665
>> 6.80232811
>> Forested 1.19692503 1.58617014 -3.05890364 -4.53524163 6.52712926
>> Grassland 0.86039999 1.89583593 -2.73676987 -4.96308190 6.93964588
>> Riparia(Cult) 5.86346654 0.37313911 -0.49686587 0.52284572
>> 11.44574598
>> Riparia(Devel) 1.36300000 0.52769839 0.29941356 -12.15807916
>> 14.22417096
>> Riparia(Fores) 1.10977501 0.49361665 0.19904123 -11.93267982
>> 13.77391850
>> Riparia(Grass) 0.68414998 0.58998474 0.01550901 -13.67422625
>> 14.50577622
>>
>> Considering that the CI are substantially overlapped, I interpret these CIs to
>> suggest there is no detectable differences between cultivated streams and other
>> streams. This cant be right.
>>
>> I have a very elementary understanding of Bayesian statistics, and have not used
>> mcmc or HPDinterval before. Is it apparent that I've done something wrong? Is
>> my sample size too small? I hope this isnt just a problem with calculating CIs
>> from the output, which I'm pretty sure is done the same way its done with lmer
>> (see below).
>>
>> Please help; I'm desperate to wrap up this analysis.
>>
>> Thank you,
>> Colin Wahl
>> Graduate Student
>> Western Wash. University
>> Bellingham, WA USA
>>
>>
>>
>> CI recalculation:
>>
>> I would be confident that CIs for each fixed effect are relative to the intercept,
>> but the CIs look wrong so I am questioning myself. Is it true that the
>> HPDintervals for each fixed effect are relative to the intercept coefficient, just
>> as they are in the initial lmer output?
>>
>> i.e...
>> lower upper
>> (Intercept) 3.684270 8.626003
>> wshD -8.256167 -1.259411
>> wshF -8.469073 -1.852237
>> wshG -9.338573 -1.498553
>> ripN -3.309220 3.105756
>> wshD:ripN -4.412022 4.944243
>> wshF:ripN -4.452947 4.422371
>> wshG:ripN -5.510786 4.917613
>>
>> becomes:
>>
>> lower upper
>> (Intercept) 3.6842697 8.626003
>> wshD -4.5718974 7.366592
>> wshF -4.7848029 6.773766
>> wshG -5.6543034 7.127450
>> ripN 0.3750498 11.731759
>> ripD -12.2931391 15.416590
>> ripF -12.5469694 14.301893
>> ripG -14.4743090 15.150819
>>
>> Using the following code:
>> ci<-as.matrix(model.HPDi$fixef)
>> C=ci[1,]
>> D=C+ci[2,]
>> FF=C+ci[3,]
>> G=C+ci[4,]
>> CN=C+ci[5,]
>> DN=(D)+(ci[5,]+ci[6,])
>> FN=(FF)+(ci[5,]+ci[7,])
>> GN=(G)+(ci[5,]+ci[8,])
>> confint<-c(C,D,FF,G,CN,DN,FN,GN)
>>
>>
>> --
>> CW
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
CW
More information about the R-sig-mixed-models
mailing list