[R-sig-ME] performance of lmer with large datasets

Douglas Bates bates at stat.wisc.edu
Fri Sep 16 20:24:25 CEST 2011


On Fri, Sep 16, 2011 at 10:46 AM,  <jpl2136 at columbia.edu> wrote:
> Hi,
>
> I am running some pretty big mlm models using glmer right now and they are taking forever. So I really would like to speed things up.
> Here is the data setup: I have around 130 datasets on the dyad level with 3,600 to 4,000,000 observations, the outcome is binary, I am using two cross-classified random intercepts (for i and j in the dyad) with equal group sizes, and about 35 covariates. For each dataset, I am currently fitting 4 models with slightly different specifications and there will be more. Right now, I am running all this in the cloud on 9 computers. I started the job over 4 days ago and I still only have results for 26 of the datasets.
>
> Here are some thoughts about speeding this up and I welcome any suggestions:
> 1) I could use a beowulf cluster to send each dataset to one cpu. I actually have access to one but it would require some work on the server side and I don't want to burden the cluster administrator too much. I am also a little worried about memory because some nodes with multiple cpus would fit the models for different datasets at the some time.
>
> 2) I could replace one of the random effects with a fixed effect (which I would actually prefer). But I think it's computationally more demanding as long as I use dummies to do that. Is there an easily way to implement the standard fixed effect data transformation to speed this up? Any existing implementations?
>
> 3) Speeding up the fitting process itself. I am not sure about this option and would be happy about any suggestions.

Which version of lme4?  Could you send the output of sessionInfo() in
R after loading the lme4 package?

The 35 covariates are likely the cause of the slow fitting.  In a
linear mixed-effects model you can remove the fixed-effects parameters
from the general optimization of the deviance because the conditional
estimates of the fixed-effects, given the other parameters, can be
determined in a direct computation.  In a generalized linear
mixed-effects model you can do the same thing but you aren't
guaranteed to get exactly the conditional estimate.  The experimental
versions of glmer allow you to use this short-cut method to fit the
model and usually produce answers that are sufficiently close to the
optimum that the difference in the deviance is nearly negligible
whereas the difference in time for fitting the model can be enormous.

> 4) The size of the 'mer' objects get pretty big and I would like to reduce the size to make later post-processing easier. I am using the function below to do that, which reduces the object size in R to ~5% of the original size. So everything seems to be fine but when I save the 'mer' object using save(fit1,fit2,file=[PATH]), the RData file is still up to 400 MB big, which is roughly the size of the R objects before using the mer.clean function. Any suggestions?

save(fit1, fit2, file=[PATH}, compress="xz")

tends to provide the best file compression in the saved object,
although it will often take longer to save the object because of the
calculations involved in the compression.
> mer.clean = function(x) {
>        x at frame=data.frame()
>        x at A=new("dgCMatrix")
>        x at Zt=new("dgCMatrix")
>        x at X=matrix() x at mu=as.numeric(NA)
>        x at muEta=as.numeric(NA)
>        x at resid=as.numeric(NA)
>        x at y=as.numeric(NA)
>        x at eta=as.numeric(NA)
>        x at pWt=as.numeric(NA)
>        x at Cx=as.numeric(NA)
>        x at var=as.numeric(NA)
>        x at sqrtXWt=matrix()
>        x at sqrtrWt=as.numeric(NA)
>        x at flist=data.frame()
>        x
> }

Rather than rewriting all of those slots I would just create a new
object that had the slots that I wanted to save.  Modifying the slots
in an object will probably result in something that does not satisfy
validObject() so it will not be of much value to load the saved object
as it will fail in most methods for the mer class.
> Thanks!
> Joscha
>
> ps: From the description of this list I understand that it is for development related discussions. When I read threads about lme4 on the main R list, people always get referred to this list though. So feel free to refer me to another list.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>




More information about the R-sig-mixed-models mailing list