[R-sig-ME] How can I make R using more than 1 core (8 available) on a Ubuntu Rstudio server ?
Paul Johnson
paul.johnson at glasgow.ac.uk
Tue Feb 13 13:23:59 CET 2018
You can use anova(fit1, fit2) to compare nested models fitted using glmmTMB. However given that you have a large data set you might want to avoid refitting the model, in which case you could use a Wald z test (this ignores the residual df — i.e. assumes it’s infinite — but probably fine with big data). This test uses a restriction matrix to test the global null hypothesis that all the k-1 factor level effects are zero. In fact it can be used more flexibly than this, but mainly that’s how I use it: for getting p-values for factors and splines from models that would be slow to refit. Here’s some example code (which is probably not very general but works for me).
# Wald z-test function
# (adapted from http://stijnruiter.nl/blog/?p=309 - now a broken link)
# this Wald test method is dubious at low n because it doesn't take account of the
# degrees of freedom.
# (see Bolker et al. Trends in Ecology & Evolution, Volume 24, Issue 3, 127-135, 29 January 2009).
# 2017-07-02: adapted to take glmmTMB fits
Wald<-
function(object, R, q = NULL, gmmTMB.model = "cond")
{
if (!is.matrix(R)) stop("Restrictions must be a matrix")
if(is.null(q)) q <- rep(0, nrow(R))
b <- fixef(object)
if(class(b) == "fixef.glmmTMB") b <- b[[gmmTMB.model]]
vc <- vcov(object)
if("vcov.glmmTMB" %in% class(vc)) vc <- vc[[gmmTMB.model]]
w <- t(R %*% b - q) %*% solve(R %*% vc %*% t(R)) %*% (R %*% b - q)
pw <- pchisq(w[1], length(q), lower.tail = FALSE)
cat("*************\n* Wald Test *\n*************\n")
cat("lme4 fixed effects:\n")
print(fixef(object))
cat("\nRestrictions:\n")
print(R)
cat("\nq = ",q)
cat("\nChi-square:", round(w[1],3), " df = ", length(q))
cat("\nProb x>chisq:", round(pw, 5), "\n")
pw
}
library(glmmTMB)
# example from ?glmmTMB
data(cbpp, package="lme4")
(tmbm1 <- glmmTMB(cbind(incidence, size-incidence) ~ period + (1 | herd),
data=cbpp, family=binomial))
# LRT
anova(update(tmbm1, ~ . - period), tmbm1)
# Wald z test
# quick and dirty way to construct the R matrix
# (is there a way to extract this directly from the fitted model?)
eff.names <- names(fixef(tmbm1)$cond)
p.names <- grep("period", eff.names, value = TRUE, fixed = TRUE)
R <-
t(sapply(p.names,
function(en)
grep(en, eff.names, value = TRUE, fixed = TRUE) == eff.names)) + 0
# p-value
Wald(tmbm1, R = R)
> On 13 Feb 2018, at 08:15, Nicolas Bédère <n.bedere at gmail.com> wrote:
>
> Dear glmmTMB users,
>
> I have tried the solution proposed by Ben Bolker since our serveur is a
> R-studio serveur, I could not try Julia.
> It runs faster indeed, thanks ! I don't want to put on a scene an old
> fight... but to publish in animal science we need P-values, whatever are
> the good reasons for not getting some... our community is not ready yet !
> with the lmer packages I used to get the global effects with Anova:car. The
> glmmTMB objects are not supported by this function. Do you know any
> solution to get the global effect of the factor instead of the effect of
> each levels ?
>
> Many thanks,
> Nicolas
>
> 2018-01-21 0:24 GMT+01:00 Hans Ekbrand <hans.ekbrand at gmail.com>:
>
>> On Thu, Jan 18, 2018 at 03:36:08PM -0500, Ben Bolker wrote:
>>> Explaining a little bit more; unlike a lot of informatics/machine
>>> learning procedures, the algorithm underlying lme4 is not naturally
>>> parallelizable. There are components that *could* be done in parallel,
>>> but it's not simple.
>>>
>>> If you need faster computation, you could either try Doug's
>>> MixedModels.jl package for Julia, or the glmmTMB package (on CRAN),
>>> which may scale better than glmer for problems with large numbers of
>>> fixed-effect parameters (although my guess is that it's close to a tie
>>> for the problem specs you quote below, unless your fixed effects are
>>> factors with several levels).
>>
>> I'm currently analysing a few huge datasets and in one of the cases
>> the outcome was binary (in the other cases, the outcome was count data
>> so I used negative binomial in glmmTMB), so I tried both glmer and
>> glmmTMB and glmmTMB was faster. My model included about 11 fixed
>> effects without interactions and three random intercept terms.
>>
>> However, I had problem getting a clean convergence when I tried to fit
>> the model to the complete dataset, both with glmer and glmmTMB, and
>> what I did might help Nicolas Bédère too. I think the convergence
>> problems in my case was related to the fact that the outcome was very
>> rare, only 11.221 cases had the outcome (death), while 5.674.928
>> didn't have the outcome (the were alive).
>>
>> Anyway, I divided the dataset into 8 bins, and fitted the same model
>> to each dataset, and since I had a 4 core CPU, 4 datasets could be
>> independently fitted in parallel. Then I took the estimates and
>> applied Rubin's Rule on them, to get pooled results.
>>
>> (In my particular case, I left all 11.221 positive cases in each of
>> the 8 datasets, while each negative case only appeared in one of the 8
>> datasets.)
>>
>> I consider what I did as a kind of poor-man's-bootstrapping, but I
>> would like to have some feedback on the valididity of results one gets
>> with the method I used. If it is valid, then it is one way of
>> parallelising glmer.
>>
>> --
>> Hans Ekbrand, Fil Dr
>> Epost/email: <hans.ekbrand at gu.se>
>> Telefon/phone: +46-31 786 47 73
>> Institutionen för sociologi och arbetsvetenskap, Göteborgs universitet
>> Department of sociology and work science, Gothenburg university
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list