[R-sig-ME] MCMCglmm for binomial models? -- updated to use rcppbugs

Fri Apr 27 03:42:23 CEST 2012

I know this is an old thread, but since I've just posted the first
version of rcppbugs on CRAN, I thought I'd see if I could convert this
model.

This model is implemented in pure R (no c++ required).  The source
code is posted below.

It runs in about 30s (for 300k burn and 300k iterations) on my
workstation and shows results simiilar to the first post I made back
in 2010.

Sadly, I had to update one of the functions in the package (to allow
for more arguments), so no one will be able to run this code with the
version currently on CRAN.  But probably by Monday, a new version will
be released.

Comments welcome.

-Whit

warmstrong at krypton:~/dvl/scripts/bugs$ Rscript herd.r
Loading required package: Rcpp
Loading required package: methods
Loading required package: RcppArmadillo
Loading required package: Matrix
Loading required package: lattice

Attaching package: ‘lme4’

The following object(s) are masked from ‘package:stats’:

    AIC, BIC

running rcppbugs model...
runtime:
   user  system elapsed
 27.329   0.024  27.425
b:
[1] -1.472493 -1.279835 -1.327631 -1.928502
b herd:
               [,1]
 [1,]  0.0558199125
 [2,] -0.0812092810
 [3,]  0.0563029683
 [4,] -0.1821773460
 [5,] -0.0192440062
 [6,] -0.0381335848
 [7,]  0.1131237285
 [8,]  0.0770301723
 [9,] -0.0583487056
[10,] -0.0953579466
[11,]  0.0006177299
[12,] -0.0585625319
[13,] -0.1661447844
[14,]  0.0719714339
[15,] -0.1351594312
warmstrong at krypton:~/dvl/scripts/bugs$

source code follows:

library(rcppbugs)
library(lme4)
data(cbpp)

cbpp.incidence <- cbpp[,"incidence"]
n <- cbpp[,"size"]

X <- matrix(0,nrow=nrow(cbpp),ncol=4)
X[1:nrow(X) + (as.integer(cbpp[,"period"]) -1) * nrow(cbpp)] <- 1
X[,1] <- 1.0

herd <- as.integer(cbpp[,"herd"])
NR <- nrow(cbpp)
J <- length(unique(herd))

tau.overdisp <- mcmc.uniform(runif(1),0,100)
overdisp <- mcmc.normal(rnorm(NR),0, tau.overdisp)
b <- mcmc.normal(rnorm(ncol(X)),mu=0,tau=0.0001)
tau.b.herd <- mcmc.uniform(runif(1),0,100)
b.herd <- mcmc.normal(rnorm(J),mu=0,tau=tau.b.herd)
phi <- deterministic(function(X,herd,b,b.herd,overdisp) { 1/(1 +
exp(-(X %*% b + b.herd[herd] + overdisp))) }, X, herd, b, b.herd,
overdisp)
incidence <- mcmc.binomial(cbpp.incidence,n=n,p=phi,observed=TRUE)
m <- create.model(tau.overdisp,overdisp,b,tau.b.herd,b.herd,phi,incidence)

iterations <- 3e5L
burn <- iterations
adapt <- 1e3L
thin <- 10L

cat("running rcppbugs model...\n")
rcppbugs.time <- system.time(ans <- run.model(m,
iterations=iterations, burn=burn, adapt=adapt, thin=thin))
cat("runtime:\n")
print(rcppbugs.time)

cat("b:\n")
print(apply(ans[["b"]],2,mean))

cat("b herd:\n")
print(as.matrix(apply(ans[["b.herd"]],2,mean)))

##cat("overdisp:")
##print(as.matrix(apply(ans[["overdisp"]],2,mean)))

On Tue, Sep 21, 2010 at 4:26 PM, Whit Armstrong
<armstrong.whit at gmail.com> wrote:
> so much for intuition.  consolidating the b's doesn't make any
> material speed difference, but it does simplify the model code a bit.
>
> btw, this run and the previous post were using 1e6 interations, 1e5
> burn-in, and 50 thin: m.sample(1e6,1e5,50).
>
> -Whit
>
>
> warmstrong at krypton:~/dvl/c++/CppBugs/test$ g++ -I.. -Wall -O2
> herd.fast.cpp -llapack
> warmstrong at krypton:~/dvl/c++/CppBugs/test$ time ./a.out
> samples: 17999
> b:
>  -1.5077
>  -1.2344
>  -1.3518
>  -1.8819
>
> tau_overdisp: 1.54806
> tau_b_herd: 22.059
> sigma_overdisp: 0.878115
> sigma_b_herd: 0.245386
> b_herd:
>   0.1464
>  -0.0475
>   0.0739
>   0.0070
>  -0.0353
>  -0.0866
>   0.1661
>   0.0658
>  -0.0585
>  -0.0904
>   0.0052
>  -0.0138
>  -0.1324
>   0.0947
>  -0.0902
>
>
> real    0m21.905s
> user    0m21.880s
> sys     0m0.020s
> warmstrong at krypton:~/dvl/c++/CppBugs/test$
>
>
> updated code:
> class HerdModel: public MCModel {
>  const ivec incidence;
>  const ivec size;
>  const ivec herd;
>  const mat fixed;
>  int N, N_herd;
>  mat permutation_matrix;
>
> public:
>  NormalStatic<vec> b;
>  UniformStatic<double> tau_overdisp;
>  UniformStatic<double> tau_b_herd;
>  Deterministic<double> sigma_overdisp;
>  Deterministic<double> sigma_b_herd;
>  Normal<vec> b_herd;
>  Normal<vec> overdisp;
>  Deterministic<vec> phi;
>  Binomial<ivec> likelihood;
>
>
>  HerdModel(const ivec& incidence_,const ivec& size_,const ivec&
> herd_,const mat& fixed_,int N_, int N_herd_):
>    incidence(incidence_),size(size_),herd(herd_),
>    fixed(fixed_),N(N_),N_herd(N_herd_),permutation_matrix(N,N_herd),
>    b(randn<vec>(4),0,0.001),tau_overdisp(1,0,1000),tau_b_herd(1,0,100),
>    sigma_overdisp(1),sigma_b_herd(1),
>    b_herd(randn<vec>(N_herd_)),overdisp(randn<vec>(N)),
>    phi(randu<vec>(N)),
>    likelihood(incidence_,true)
>  {
>    permutation_matrix.fill(0.0);
>    for(uint i = 0; i < herd.n_elem; i++) {
>      permutation_matrix(i,herd[i]) = 1.0;
>    }
>    add(b);
>    add(tau_overdisp);
>    add(tau_b_herd);
>    add(sigma_overdisp);
>    add(sigma_b_herd);
>    add(b_herd);
>    add(overdisp);
>    add(phi);
>    add(likelihood);
>  }
>
>  void update() {
>    phi.value = fixed*b.value + sum(permutation_matrix*b_herd.value,1)
> + overdisp.value;
>    phi.value = 1/(1+exp(-phi.value));
>    sigma_overdisp.value = 1/sqrt(tau_overdisp.value);
>    sigma_b_herd.value = 1/sqrt(tau_b_herd.value);
>  }
>  double logp() const {
>    return b.logp() + tau_overdisp.logp() + tau_b_herd.logp() +
> b_herd.logp(0, tau_b_herd.value) +
>      overdisp.logp(0,tau_overdisp.value) + likelihood.logp(size,phi.value);
>  }
> };
>
>
>
> On Tue, Sep 21, 2010 at 3:41 PM, Whit Armstrong
> <armstrong.whit at gmail.com> wrote:
>> Bob,
>>
>> Just wanted to post the results from my project.  It is still very
>> much alpha quality software, but the speed is good.  Your sample model
>> runs in 22secs on my workstation.  I ran the model in JAGS earlier but
>> it took ages (20min or so).
>>
>> These are my results which are closer to the MCMCglmm run than the
>> WinBUGS run.  Eventually, I'll try to make it easy to run these models
>> directly from R.
>>
>> The full code for this model is posted in-line below.  However, the
>> guts are just two functions which just separate the winbugs model into
>> separate "update" and "logp" functions:
>>
>>  void update() {
>>    phi.value = b0.value + b_period2.value*period2 +
>> b_period3.value*period3 + b_period4.value*period4 +
>> sum(permutation_matrix*b_herd.value,1) + overdisp.value;
>>    phi.value = 1/(1+exp(-phi.value));
>>    sigma_overdisp.value = 1/sqrt(tau_overdisp.value);
>>    sigma_b_herd.value = 1/sqrt(tau_b_herd.value);
>>  }
>>  double logp() const {
>>    return b0.logp() + b_period2.logp() + b_period3.logp() +
>> b_period4.logp() + tau_overdisp.logp() + tau_b_herd.logp() +
>> b_herd.logp(0, tau_b_herd.value) +
>>      overdisp.logp(0,tau_overdisp.value) + likelihood.logp(size,phi.value);
>>  }
>>
>> The speed of the model could probably be improved by consolidating all
>> the b's into a vector.  I'll try that later on.  Code for this model,
>> and some other examples will go up on github sometime in the near
>> future.
>>
>> I've also cc'd Chris Fonnesbeck, since he may want to chime in on the
>> pymc front.  I haven't had time to code up an equivalent pymc model,
>> but it should be very easy to do.
>>
>> Feedback welcome.
>>
>> -Whit
>>
>>
>> results:
>> warmstrong at krypton:~/dvl/c++/CppBugs/test$ time ./a.out
>> samples: 17999
>> b0: -1.52515
>> b_period2: -1.21971
>> b_period3: -1.32685
>> b_period4: -1.88475
>> tau_overdisp: 1.59118
>> tau_b_herd: 25.9476
>> sigma_overdisp: 0.886191
>> sigma_b_herd: 0.258111
>> b_herd:
>>   0.1623
>>  -0.0524
>>   0.0852
>>   0.0080
>>  -0.0506
>>  -0.0953
>>   0.1859
>>   0.0835
>>  -0.0690
>>  -0.0897
>>   0.0096
>>  -0.0183
>>  -0.1496
>>   0.1133
>>  -0.1061
>>
>>
>> real    0m21.930s
>> user    0m21.920s
>> sys     0m0.000s
>> warmstrong at krypton:~/dvl/c++/CppBugs/test$
>>
>>
>> model code:
>> class HerdModel: public MCModel {
>>  const ivec incidence;
>>  const ivec size;
>>  const ivec herd;
>>  const vec period2;
>>  const vec period3;
>>  const vec period4;
>>  int N, N_herd;
>>  mat permutation_matrix;
>>
>> public:
>>  NormalStatic<double> b0;
>>  NormalStatic<double> b_period2;
>>  NormalStatic<double> b_period3;
>>  NormalStatic<double> b_period4;
>>  UniformStatic<double> tau_overdisp;
>>  UniformStatic<double> tau_b_herd;
>>  Deterministic<double> sigma_overdisp;
>>  Deterministic<double> sigma_b_herd;
>>  Normal<vec> b_herd;
>>  Normal<vec> overdisp;
>>  Deterministic<vec> phi;
>>  Binomial<ivec> likelihood;
>>
>>
>>  HerdModel(const ivec& incidence_,const ivec& size_,const ivec& herd_,
>>            const vec& period2_,const vec& period3_,const vec&
>> period4_, int N_, int N_herd_):
>>    incidence(incidence_),size(size_),herd(herd_),
>>    period2(period2_),period3(period3_),period4(period4_),
>>    N(N_),N_herd(N_herd_),permutation_matrix(N,N_herd),
>>    b0(0,0,0.001),b_period2(0,0,0.001),b_period3(0,0,0.001),b_period4(0,0,0.001),
>>    tau_overdisp(1,0,1000),tau_b_herd(1,0,100),
>>    sigma_overdisp(1),sigma_b_herd(1),
>>    b_herd(randn<vec>(N_herd_)),overdisp(randn<vec>(N)),
>>    phi(randu<vec>(N)),
>>    likelihood(incidence_,true)
>>  {
>>    permutation_matrix.fill(0.0);
>>    for(uint i = 0; i < herd.n_elem; i++) {
>>      permutation_matrix(i,herd[i]) = 1.0;
>>    }
>>    add(b0);
>>    add(b_period2);
>>    add(b_period3);
>>    add(b_period4);
>>    add(tau_overdisp);
>>    add(tau_b_herd);
>>    add(sigma_overdisp);
>>    add(sigma_b_herd);
>>    add(b_herd);
>>    add(overdisp);
>>    add(phi);
>>    add(likelihood);
>>  }
>>
>>  void update() {
>>    phi.value = b0.value + b_period2.value*period2 +
>> b_period3.value*period3 + b_period4.value*period4 +
>> sum(permutation_matrix*b_herd.value,1) + overdisp.value;
>>    phi.value = 1/(1+exp(-phi.value));
>>    sigma_overdisp.value = 1/sqrt(tau_overdisp.value);
>>    sigma_b_herd.value = 1/sqrt(tau_b_herd.value);
>>  }
>>  double logp() const {
>>    return b0.logp() + b_period2.logp() + b_period3.logp() +
>> b_period4.logp() + tau_overdisp.logp() + tau_b_herd.logp() +
>> b_herd.logp(0, tau_b_herd.value) +
>>      overdisp.logp(0,tau_overdisp.value) + likelihood.logp(size,phi.value);
>>  }
>> };
>>
>>
>>
>> On Mon, Sep 20, 2010 at 5:47 PM, Bob Farmer <farmerb at gmail.com> wrote:
>>> Thanks for the further detail.  I wasn't familiar with these two
>>> methods of overdispersion; I think I only understood them in the
>>> WinBUGS sense, which, I believe, is additive (e.g. overdisp[i] as a
>>> parameter).  I'll have to explore the family() helpfiles a bit more.
>>>
>>> Below is a summary of the model outputs from the three methods
>>> described earlier.  Note that here, I bumped the WinBUGS iterations to
>>> 750E3 to reduce the Rhats a bit further.
>>> I also use an adaptation of the bugsParallel function to make my
>>> WinBUGS runs 3x faster (time for this run:  5.8 minutes); I can
>>> provide details if anybody else would like to use this function (and
>>> some other bugs summary functions I've written).
>>> Hope this is useful or interesting -- I appreciate this discussion!
>>>
>>> --Bob
>>>
>>> **glmer**
>>>> summary(gm3)@coefs
>>>              Estimate Std. Error   t value
>>> (Intercept) -1.3985351 0.01698321 -82.34810
>>> period2     -0.9923347 0.02275838 -43.60304
>>> period3     -1.1286754 0.02429833 -46.45075
>>> period4     -1.5803739 0.03195596 -49.45474
>>>
>>> **MCMCglmm**
>>>> summary(mc3)
>>>  Iterations = 749351
>>>  Thinning interval  = 100001
>>>  Sample size  = 1000
>>>  DIC: 539.7889
>>>  G-structure:  ~herd
>>>     post.mean  l-95% CI u-95% CI eff.samp
>>> herd    0.2894 1.309e-06    1.035     1000
>>>  R-structure:  ~units
>>>      post.mean l-95% CI u-95% CI eff.samp
>>> units      0.93 0.003825     1.98     1000
>>>  Location effects: cbind(incidence, size - incidence) ~ period
>>>            post.mean l-95% CI u-95% CI eff.samp  pMCMC
>>> (Intercept)   -1.5263  -2.2221  -0.9602   1000.0 <0.001 ***
>>> period2       -1.2407  -2.2072  -0.2449    947.2  0.012 *
>>> period3       -1.3574  -2.4534  -0.3472   1000.0  0.012 *
>>> period4       -1.9126  -3.2662  -0.8298   1136.0  0.002 **
>>>
>>> **WinBUGS**
>>>> bug3$summary[which(rownames(bug3$summary) %in% params), c("mean", "sd", "2.5%", "97.5%", "Rhat", "n.eff")]
>>>                     mean        sd        2.5%      97.5%     Rhat n.eff
>>> B.0            -1.6070301 0.3556674 -2.33980000 -0.9947425 1.039548    69
>>> B.period2      -1.2761798 0.4895902 -2.14292500 -0.2391200 1.018612   210
>>> B.period3      -1.5100405 0.6337780 -2.63680000 -0.2266425 1.123427    22
>>> B.period4      -2.0678331 0.6539004 -3.31892500 -0.8586700 1.018622   270
>>> sigma.overdisp  1.1975969 0.3147531  0.69358404  1.8740000 1.045062    55
>>> sigma.b.herd    0.5120043 0.3825848  0.01658429  1.3659249 1.542432     7
>>> (note that I can provide a link to parameter histograms, if you want
>>> more detail)
>>>
>>> --Bob
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>