[R-sig-ME] MCMCgmml univariate model - binary response.
Patricia Soto
patricia.soto.b at gmail.com
Thu Jun 18 10:06:16 CEST 2015
Hello Everyone!
I am using MCMCglmm (Bayesian statistics for the first time in my life!) to
test whether Predictor1 and Predictor2 explain Response1 (Disclaimer:
apologies I am not naming my variables but the data is on students and
teachers and goes in a paper that I have to resubmit soon!). After reading
many, many times Hadfield's tutorial and course vignette, I came up with
the following specification of my model:
priorZ = list(R = list(V = 1, nu = 0, fix = 1), G = list(G1 = list(V = 1, n
= 0), G2 = list(V = 1, n = 0)))
fac<-1
msgZ<-MCMCglmm(Response1 ~ Predictor1 + Predictor2 + Predictor1*Predictor2,
random = ~Random1 + Random2 , prior = priorZ, data = our_data, pr = TRUE,
family = "categorical", verbose = TRUE, nitt = 13000 * fac, burnin = 3000
* fac, thin = 10 * fac)
Now the big question I have (and clearly I have not understood yet the
explanations found in a number of sources) is how to make sure my model is
meaningful.
Option 1: I thought I could go with the following,
> inv.logit(HPDinterval(msgZ$Sol, 0.95))
> inv.logit(posterior.mode(msgZ$Sol))
but I fail to interpret the very narrow lower - upper interval (what am I
missing in my understanding? do these values make sense?):
lower upper
(Intercept) 0.006123962 0.0006289942 0.02294555
Predictor1 0.998453915 0.9941636860 0.99961655
Predictor2.1 0.999625733 0.9884580704 0.99994442
Predictor2.2 0.988390462 0.9484153243 0.99898593
Predictor2.3 0.996238689 0.9871810929 0.99976048
Predictor1:Predictor2.1 0.188756844 0.0188979835 0.8300961
Predcitor1:Predictor2.2 0.006322825 0.0016985902 0.02754298
Predictor1:Predictor2.3 0.009168150 0.0016535274 0.02740872
I have tried several priors and while the posterior.mode remains about the
same, the lower bound of the CI interval changes dramatically. The upper
bound does not change much, if at all.
Option 2: With this approach, I feel I cannot argue in terms of how my
predictors explain the response variable:
> m.modelZ<-mcmc(mapply(msgZ.data.scale,msgZ$Sol, rowSums(msgZ$VCV)))> summary(m.modelZ)
Iterations = 1:72000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 72000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.5022283 0.2586471 0.0009639 0.0303169
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
0.04658 0.31975 0.49605 0.69394 0.98146
Option 3: Here I thought I could report based on:
>summary(msgZ)
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -5.410 -7.371 -3.751 86.79 <0.001 ***
Predictor1 6.588 5.138 7.866 60.52 <0.001 ***
Predictor2.1 7.282 4.450 9.798 83.33 <0.001 ***
Predictor2.2 4.833 2.912 6.893 76.30 <0.001 ***
Predictor2.3 6.077 4.344 8.337 96.86 <0.001 ***
Response1:Predictor2.1 -1.024 -3.950 1.586 70.63 0.456
Response1:Predictor2.2 -5.096 -6.376 -3.564 75.21 <0.001 ***
Response1:Predictor2.3 -4.892 -6.403 -3.569 70.98 <0.001 ***
I can see significant statistics for most of my predictors. If I look at
the raw data, we actually expect Predictor1 to have the strongest effect (I
am almost certain I need to refine my definition of the random effects)
What would be the guidelines to report results for a peer-reviewed
publication? I know this is discipline dependent but I am trying to publish
this paper in a field of research new to me, so I am looking for quality
standards across disciplines.
Please, any insight, tip, suggestion is truly appreciated!
Patricia
patricia.soto.b at gmail.com
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list