[R-sig-ME] MCMCgmml univariate model - binary response.

Thu Jun 18 10:06:16 CEST 2015

Hello Everyone!

I am using MCMCglmm (Bayesian statistics for the first time in my life!) to
test whether Predictor1 and Predictor2 explain Response1 (Disclaimer:
apologies I am not naming my variables but the data is on students and
teachers and goes in a paper that I have to resubmit soon!). After reading
many, many times Hadfield's tutorial and course vignette, I came up with
the following specification of my model:

priorZ = list(R = list(V = 1, nu = 0, fix = 1), G = list(G1 = list(V = 1, n
= 0), G2 = list(V = 1, n = 0)))
fac<-1
msgZ<-MCMCglmm(Response1 ~ Predictor1 + Predictor2 + Predictor1*Predictor2,
random = ~Random1 + Random2 , prior = priorZ, data = our_data, pr = TRUE,
family = "categorical",  verbose = TRUE, nitt = 13000 * fac, burnin = 3000
* fac, thin = 10 * fac)

Now the big question I have (and clearly I have not understood yet the
explanations found in a number of sources) is how to make sure my model is
meaningful.

Option 1: I thought I could go with the following,

> inv.logit(HPDinterval(msgZ$Sol, 0.95))

> inv.logit(posterior.mode(msgZ$Sol))

but I fail to interpret the very narrow lower - upper interval (what am I
missing in my understanding? do these values make sense?):

                                                lower        upper
(Intercept)              0.006123962        0.0006289942    0.02294555
Predictor1                0.998453915        0.9941636860    0.99961655
Predictor2.1             0.999625733        0.9884580704    0.99994442
Predictor2.2             0.988390462        0.9484153243    0.99898593
Predictor2.3             0.996238689        0.9871810929     0.99976048

Predictor1:Predictor2.1   0.188756844        0.0188979835    0.8300961

Predcitor1:Predictor2.2   0.006322825        0.0016985902    0.02754298
Predictor1:Predictor2.3   0.009168150        0.0016535274    0.02740872

I have tried several priors and while the posterior.mode remains about the
same, the lower bound of the CI interval changes dramatically. The upper
bound does not change much, if at all.

Option 2: With this approach, I feel I cannot argue in terms of how my
predictors explain the response variable:

> m.modelZ<-mcmc(mapply(msgZ.data.scale,msgZ$Sol, rowSums(msgZ$VCV)))> summary(m.modelZ)
Iterations = 1:72000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 72000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE Time-series SE
     0.5022283      0.2586471      0.0009639      0.0303169

2. Quantiles for each variable:

   2.5%     25%     50%     75%   97.5%
0.04658 0.31975 0.49605 0.69394 0.98146

Option 3: Here I thought I could report based on:

>summary(msgZ)

                           post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)                 -5.410   -7.371   -3.751    86.79 <0.001 ***
Predictor1                  6.588    5.138    7.866    60.52 <0.001 ***
Predictor2.1                7.282    4.450    9.798    83.33 <0.001 ***

Predictor2.2                4.833    2.912    6.893    76.30 <0.001 ***

Predictor2.3                6.077    4.344    8.337    96.86 <0.001 ***

Response1:Predictor2.1     -1.024   -3.950    1.586    70.63  0.456

Response1:Predictor2.2     -5.096   -6.376   -3.564    75.21 <0.001 ***

Response1:Predictor2.3      -4.892   -6.403   -3.569    70.98 <0.001 ***

I can see significant statistics for most of my predictors. If I look at
the raw data, we actually expect Predictor1 to have the strongest effect (I
am almost certain I need to refine my definition of the random effects)

What would be the guidelines to report results for a peer-reviewed
publication? I know this is discipline dependent but I am trying to publish
this paper in a field of research new to me, so I am looking for quality
standards across disciplines.

Please, any insight, tip, suggestion is truly appreciated!

Patricia

patricia.soto.b at gmail.com

	[[alternative HTML version deleted]]