[R-sig-ME] How can I make R using more than 1 core (8 available) on a Ubuntu Rstudio server ?

Tue Feb 13 13:23:59 CET 2018

You can use anova(fit1, fit2) to compare nested models fitted using glmmTMB. However given that you have a large data set you might want to avoid refitting the model, in which case you could use a Wald z test (this ignores the residual df — i.e. assumes it’s infinite — but probably fine with big data). This test uses a restriction matrix to test the global null hypothesis that all the k-1 factor level effects are zero. In fact it can be used more flexibly than this, but mainly that’s how I use it: for getting p-values for factors and splines from models that would be slow to refit. Here’s some example code (which is probably not very general but works for me).

# Wald z-test function 
# (adapted from http://stijnruiter.nl/blog/?p=309 - now a broken link)
# this Wald test method is dubious at low n because it doesn't take account of the 
# degrees of freedom.
# (see Bolker et al. Trends in Ecology & Evolution, Volume 24, Issue 3, 127-135, 29 January 2009).
# 2017-07-02: adapted to take glmmTMB fits

Wald<-
  function(object, R, q = NULL, gmmTMB.model = "cond")
  {
    if (!is.matrix(R)) stop("Restrictions must be a matrix")
    if(is.null(q)) q <- rep(0, nrow(R))
    b <- fixef(object)
    if(class(b) == "fixef.glmmTMB") b <- b[[gmmTMB.model]]
    vc <- vcov(object)
    if("vcov.glmmTMB" %in% class(vc)) vc <- vc[[gmmTMB.model]]
    w <- t(R %*% b - q) %*% solve(R %*% vc %*% t(R)) %*% (R %*% b - q)
    pw <- pchisq(w[1], length(q), lower.tail = FALSE)
    cat("*************\n* Wald Test *\n*************\n")
    cat("lme4 fixed effects:\n")
    print(fixef(object))
    cat("\nRestrictions:\n")
    print(R)
    cat("\nq = ",q)
    cat("\nChi-square:", round(w[1],3), " df = ", length(q))
    cat("\nProb x>chisq:", round(pw, 5), "\n")
    pw
  }

library(glmmTMB)

# example from ?glmmTMB
data(cbpp, package="lme4")
(tmbm1 <- glmmTMB(cbind(incidence, size-incidence) ~ period + (1 | herd),
                  data=cbpp, family=binomial))

# LRT
anova(update(tmbm1, ~ . - period), tmbm1)

# Wald z test
# quick and dirty way to construct the R matrix
# (is there a way to extract this directly from the fitted model?)
eff.names <- names(fixef(tmbm1)$cond)
p.names <- grep("period", eff.names, value = TRUE, fixed = TRUE)
R <- 
  t(sapply(p.names, 
           function(en) 
             grep(en, eff.names, value = TRUE, fixed = TRUE) == eff.names)) + 0

# p-value
Wald(tmbm1, R = R)

> On 13 Feb 2018, at 08:15, Nicolas Bédère <n.bedere at gmail.com> wrote:
> 
> Dear glmmTMB users,
> 
> I have tried the solution proposed by Ben Bolker since our serveur is a
> R-studio serveur, I could not try Julia.
> It runs faster indeed, thanks ! I don't want to put on a scene an old
> fight... but to publish in animal science we need P-values, whatever are
> the good reasons for not getting some... our community is not ready yet !
> with the lmer packages I used to get the global effects with Anova:car. The
> glmmTMB objects are not supported by this function. Do you know any
> solution to get the global effect of the factor instead of the effect of
> each levels ?
> 
> Many thanks,
> Nicolas
> 
> 2018-01-21 0:24 GMT+01:00 Hans Ekbrand <hans.ekbrand at gmail.com>:
> 
>> On Thu, Jan 18, 2018 at 03:36:08PM -0500, Ben Bolker wrote:
>>>  Explaining a little bit more; unlike a lot of informatics/machine
>>> learning procedures, the algorithm underlying lme4 is not naturally
>>> parallelizable. There are components that *could* be done in parallel,
>>> but it's not simple.
>>> 
>>>  If you need faster computation, you could either try Doug's
>>> MixedModels.jl package for Julia, or the glmmTMB package (on CRAN),
>>> which may scale better than glmer for problems with large numbers of
>>> fixed-effect parameters (although my guess is that it's close to a tie
>>> for the problem specs you quote below, unless your fixed effects are
>>> factors with several levels).
>> 
>> I'm currently analysing a few huge datasets and in one of the cases
>> the outcome was binary (in the other cases, the outcome was count data
>> so I used negative binomial in glmmTMB), so I tried both glmer and
>> glmmTMB and glmmTMB was faster. My model included about 11 fixed
>> effects without interactions and three random intercept terms.
>> 
>> However, I had problem getting a clean convergence when I tried to fit
>> the model to the complete dataset, both with glmer and glmmTMB, and
>> what I did might help Nicolas Bédère too. I think the convergence
>> problems in my case was related to the fact that the outcome was very
>> rare, only 11.221 cases had the outcome (death), while 5.674.928
>> didn't have the outcome (the were alive).
>> 
>> Anyway, I divided the dataset into 8 bins, and fitted the same model
>> to each dataset, and since I had a 4 core CPU, 4 datasets could be
>> independently fitted in parallel. Then I took the estimates and
>> applied Rubin's Rule on them, to get pooled results.
>> 
>> (In my particular case, I left all 11.221 positive cases in each of
>> the 8 datasets, while each negative case only appeared in one of the 8
>> datasets.)
>> 
>> I consider what I did as a kind of poor-man's-bootstrapping, but I
>> would like to have some feedback on the valididity of results one gets
>> with the method I used. If it is valid, then it is one way of
>> parallelising glmer.
>> 
>> --
>> Hans Ekbrand, Fil Dr
>> Epost/email: <hans.ekbrand at gu.se>
>> Telefon/phone: +46-31 786 47 73
>> Institutionen för sociologi och arbetsvetenskap, Göteborgs universitet
>> Department of sociology and work science, Gothenburg university
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models