[R-sig-ME] Model specification help

Fri Mar 9 18:46:46 CET 2007

On Fri, 9 Mar 2007, Douglas Bates wrote:

> On 3/9/07, Andrew Perrin <clists at perrin.socsci.unc.edu> wrote:
>> Update: I decided to run the lmer and lmer2 versions of the code you
>> suggested simultaneously, on two machines:
>> 
>> > grades.lmer<-lmer(grade.pt ~ (1|stud.id) + (1|instr.id) + (1|cour.dep),
>> +                   newgrades.stripped.df,
>> +                   control =
>> +                   list(gradient = FALSE, niterEM = 0, msVerbose = 1)
>> +                   )
>> 
>> 
>> They are still working their way through, but I thought it was interesting
>> that (a) lmer2 seems to be using less RAM, by roughly 0.3G; (b) even lmer
>> seems well within the 3G limit, maxing out at about 1.6G so far; and (c)
>> for the first iteration, there are both similarities and differences:
>
> That's great news.  Would you be willing to give us an idea of the
> number of observations and the number of levels for each of the
> groups?  I don't imagine that this would violate confidentiality but
> if you have misgivings it would be helpful even to have ballpark
> estimates of the size of the problem.

Sure, I don't think there's any confidentiality issue at this level of 
abstraction.  It's about 1.7 million observations on 54,711 unique 
students in 70,366 unique course sections. These are in 106 different 
departments taught by 7,964 unique instructors.

I will post comparative results whenever they're complete and I'm back in 
the office - likely Monday.

Thanks,
Andy

> length(newgrades.stripped.df$stud.id)
[1] 1721024
> length(levels(as.factor(newgrades.stripped.df$stud.id)))
[1] 54711
> length(levels(as.factor(newgrades.stripped.df$section)))
[1] 70366
> length(levels(as.factor(newgrades.stripped.df$instr.id)))
[1] 7964
> length(levels(as.factor(newgrades.stripped.df$cour.dep)))
[1] 106
>

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu
Assistant Professor of Sociology; Book Review Editor, _Social Forces_
University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA
New Book: http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/178592.ctl

>
>> lmer:    0  3.66128e+06: 0.0865649 0.0125233 0.000161387
>> lmer2:   0  3.66128e+06: 0.294219 0.111907 0.0127038
>
> That's to be expected.  For a problem like this the parameters
> optimized in lmer2 are on the scale of the relative standard deviation
> of the random effects (relative to the standard deviation of the
> residual noise component) whereas the parameters optimized in lmer are
> on the scale of the relative  variance.  The starting values are
> calculated the same way so at the first iteration the lmer values are
> simply the squares of the lmer2 values
>
>>  c(0.294219, 0.111907, 0.0127038)^2
> [1] 0.0865648200 0.0125231766 0.0001613865
>
> After the first iteration they will no longer correspond because the
> iteration trajectories will be affected by the scaling.  I hope it
> will be the case that lmer2 converges in fewer iterations than does
> lmer.  I would appreciate knowing what the results are.
>
>> 
>> (since I don't know what that diagnostic means, I can't determine whether
>> to be worried about the difference or not!)
>> 
>> More when the models finish.
>> 
>> Andy
>> 
>> ----------------------------------------------------------------------
>> Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu
>> Assistant Professor of Sociology; Book Review Editor, _Social Forces_
>> University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA
>> New Book: http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/178592.ctl
>> 
>> 
>> 
>> On Fri, 9 Mar 2007, Douglas Bates wrote:
>> 
>> > On 3/8/07, Andrew Perrin <clists at perrin.socsci.unc.edu> wrote:
>> >> On Thu, 8 Mar 2007, elw at stderr.org wrote:
>> >>
>> >> >
>> >> >> Thank you for this. I will return to it tomorrow and let you know how 
>> it
>> >> >> goes. As for the machine it's running on: it's a dual-Xeon 2.8Ghz IBM
>> >> >> eseries server with 6GB RAM, running debian Linux, kernel 2.6.18.  So
>> >> the
>> >> >> 3GB per-process memory limit applies. I also have access to a shared
>> >> server
>> >> >> with "twenty-four 1.05 GHz Ultra-Sparc III+ processors and 40 GB of 
>> main
>> >> >> memory" running solaris, if that's better.
>> >> >
>> >> > Andrew,
>> >> >
>> >> > That gets you onto a 64-bit platform, beyond the 32-bit-Intel 4GB 
>> memory
>> >> (3G
>> >> > for user process, 1G for OS kernel) limit, and beyond a bunch of other
>> >> data
>> >> > size limits.  The memory bandwidth available to you on the Solaris
>> >> machine is
>> >> > also likely to be much more significant - something that you will find
>> >> quite
>> >> > pleasant for even some more trivial analyses.  :)
>> >> >
>> >> > Much better, certainly!  [And very much like what 'beefy' R code is 
>> most
>> >> > frequently run on...]
>> >> >
>> >> > W.r.t. the eSeries server you're commonly running on now - if you can
>> >> have
>> >> > your systems people check to make sure that you have a PAE-enabled 
>> linux
>> >> > kernel running, you might be able to muscle past the 3GB mark with a
>> >> single R
>> >> > process.... with some work.
>> >> >
>> >> > [If the machine can actually "see" all 6GB of memory, you probably 
>> have a
>> >> PAE
>> >> > kernel.]
>> >> >
>> >> > --e
>> >> >
>> >>
>> >> Ironically enough, I *am* the systems people for the eSeries.... having
>> >> been a unix sysadmin and perl programmer before cutting and running for
>> >> social science :).. The kernel is PAE enabled, but that only helps with
>> >> seeing 6G altogether, not over 3G for a single process. I toyed with the
>> >> idea of whether I could break down the process into several threaded 
>> ones,
>> >> but that's way above my head.
>> >>
>> >> (The Solaris cluster is university-run, though.)
>> >
>> > I haven't done a thorough analysis of the memory usage in lmer but I
>> > can make some informed guesses as to where memory can be saved.  The
>> > details of the implementation and the slots in the internal
>> > representation of the model are given in the "Implementation" vignette
>> > in the lme4 package.  At present there is only one small example shown
>> > in there but I will add others.
>> >
>> > For the model fitting process itself the largest object needed is the
>> > symmetric sparse matrix in the A slot and the Cholesky factor of the
>> > updated A*.  The dimension of that square matrix is the sum of the
>> > sizes of the random effects vector and the fixed effects vector plus 1
>> > (for the response).  Generally the Cholesky factor will be slightly
>> > larger than the A but care is taken to make the Cholesky factor as
>> > small as possible.
>> >
>> > I enclose an example from fitting a model with two random effects per
>> > student, one random effect per teacher and two random effects per
>> > school to the star (Tennessee's Student-Teacher Achievement Ratio
>> > study) data.  The dimension of the random effects will be 2*10732 +
>> > 1374 + 2 * 80 so that easily dominates the dimension of A.
>> >
>> > In this case the sizes of the slots L, A, ZXyt and frame are
>> > comparable.  However, if we strip things down to the bare essentials
>> > we don't need ZXyt, frame, flist, offset and weights after the matrix
>> > A has been constructed.
>> >
>> > The dimension of the matrices L and A is dominated by the dimension of
>> > the random effects vector.  The dimension of ZXyt, etc. involves the
>> > number of observations.  This might be good news in your case in that
>> > the sizes of the parts that must be preserved are dominated by the
>> > number of students and not the number of grades recorded.
>> >
>> 
>