[R-sig-ME] large dataset

Dan Pemstein dbp at uiuc.edu
Wed Jan 31 00:16:52 CET 2007


Hi all,

I'm attempting to fit a crossed random effects model to a rather large
data set.  This is EU parliament voting data (the response variable is
binary) from 574 legislators over 2123 votes.  EU parliamentarians
miss a lot of votes so there are ~700,000 total observations.  The
model also includes quite a few covariates---on the order of 30-50
(mostly fixed effects for country, party, etc), depending on the
particular specification.  I'm having some serious issues fitting a
crossed effects logit model to this data with lme4 without exhausting
system memory.  I have a quad-core intel linux machine with 8 gigs of
ram and a lot of swap to play with, but I'm still falling short.
Interestingly, I've successfully fit this model using HLM6 on a
machine with substantially less RAM.

My question is largely about feasibility.  I would like to use lme4 to
analyze this dataset because it provides a much better set of features
for checking model fit and generating predictions than HLM (one can't
even get the fixed effects variance-covariance matrix out of HLM6's
crossed effects routine).  Is this impossible?  Are there any ways to
reduce lmer's memory footprint that I might try?  Would one expect a
cross-classified logit model with 700,000 observations to require
upwards of 12 gigs of memory or have I uncovered a small memory leak
that isn't visible with smaller datasets?  The memory use creeps up
slowly over the course of a run which is at least consistent with a
memory leak, but, not knowing anything about the implementation, I'm
just speculating wildly here.  Obviously, I could sub-sample, but this
is already a sample of a larger dataset, so I'm loathe to do that if I
can avoid it.

thanks,

Dan

-- 
Daniel Pemstein
Department of Political Science
University of Illinois at Urbana-Champaign
702 S. Wright St.
Urbana, IL 61801

Email: dbp at uiuc.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20070130/10955047/attachment.bin>


More information about the R-sig-mixed-models mailing list