[R-sig-ME] performance of lmer with large datasets

jpl2136 at columbia.edu jpl2136 at columbia.edu
Fri Sep 16 17:46:40 CEST 2011


Hi, 

I am running some pretty big mlm models using glmer right now and they are taking forever. So I really would like to speed things up.
Here is the data setup: I have around 130 datasets on the dyad level with 3,600 to 4,000,000 observations, the outcome is binary, I am using two cross-classified random intercepts (for i and j in the dyad) with equal group sizes, and about 35 covariates. For each dataset, I am currently fitting 4 models with slightly different specifications and there will be more. Right now, I am running all this in the cloud on 9 computers. I started the job over 4 days ago and I still only have results for 26 of the datasets. 

Here are some thoughts about speeding this up and I welcome any suggestions:
1) I could use a beowulf cluster to send each dataset to one cpu. I actually have access to one but it would require some work on the server side and I don't want to burden the cluster administrator too much. I am also a little worried about memory because some nodes with multiple cpus would fit the models for different datasets at the some time. 

2) I could replace one of the random effects with a fixed effect (which I would actually prefer). But I think it's computationally more demanding as long as I use dummies to do that. Is there an easily way to implement the standard fixed effect data transformation to speed this up? Any existing implementations?

3) Speeding up the fitting process itself. I am not sure about this option and would be happy about any suggestions. 

4) The size of the 'mer' objects get pretty big and I would like to reduce the size to make later post-processing easier. I am using the function below to do that, which reduces the object size in R to ~5% of the original size. So everything seems to be fine but when I save the 'mer' object using save(fit1,fit2,file=[PATH]), the RData file is still up to 400 MB big, which is roughly the size of the R objects before using the mer.clean function. Any suggestions? 	 

mer.clean = function(x) { 
	x at frame=data.frame() 
	x at A=new("dgCMatrix")
	x at Zt=new("dgCMatrix") 
	x at X=matrix() x at mu=as.numeric(NA) 
	x at muEta=as.numeric(NA)
	x at resid=as.numeric(NA)
	x at y=as.numeric(NA) 
	x at eta=as.numeric(NA) 
	x at pWt=as.numeric(NA)
	x at Cx=as.numeric(NA)
	x at var=as.numeric(NA)
	x at sqrtXWt=matrix()
	x at sqrtrWt=as.numeric(NA)
	x at flist=data.frame() 
	x 
}

Thanks!
Joscha

ps: From the description of this list I understand that it is for development related discussions. When I read threads about lme4 on the main R list, people always get referred to this list though. So feel free to refer me to another list.  




More information about the R-sig-mixed-models mailing list