[R] glmmPQL() and memory limitations

Tue Aug 19 01:07:44 CEST 2003

Elliott  -

I don't know if you've had any other responses off-list yet; none
have shown up on the r-help mailing list during the day today.
I'm really NOT the most expert person to answer this, but I'll give
it a try.

Your option (1) seems entirely possible to me.

Let me do some thinking out loud to see how the numbers add up.
The design matrix for the fixed effects should have dimensions

11,000 rows  x  ((6 + 6) * 2) = 24 columns  =  264 K values

The design matrix for the random effects might have dimensions

11,000 rows  x  ((1 + (6 + 6) * 2) * 16) = 400 columns  =  4.4 M values

Say 4.7 M values, total.  At worst, these will be stored as
8-bit double precision numbers, (they very likely are) so 38 Mb
for one copy of the logistic regression problem.  Ah, but then I
look at the error message you quote below, and there's some single
object of 62 Mb that R is manipulating.  My calculation above is
low by a factor of 1.6 or so.

R wants quite a lot of space to turn around in.  I usually figure
4 copies of the data just to do the simplest arithmetic and assign
the result.  The function  glmmPQL()  might be keeping 10 or 20
copies of the regression problem around - but that's only 760 Mb,
(assuming 38 Mb each), so if that were all, you would be okay.
If each node is running an instance of the problem on both processors,
then they have only 1 Gb each, and you're pretty close to the limit,
including R's overhead and the operating system overhead.

If there's a way to keep one processor empty on each node, that
would double the memory available to each instance of the problem
(but it ONLY doubles it).

I observe,  11,000 rows >> 6 * 6 * 2 * 16 = 1152.  That suggests
there might be a way to collapse multiple Bernoulli outcomes at
the same combination of  Subject, Stop, Son and StopResp  into
a binomial outcome  (# successes, # failures)  as for  glm().
I don't know whether  glmmPQL()  supports this response data
format.  (See "Details" in  help("glm") to see what I'm talking
about.)  If you are able to do this, it could reduce the size of
the random factor design matrix proportionately.

For single-processor implementations of R, the information you
might want is on the help pages  help("Memory")  and  help("gc").
I've NO experience with threaded versions and how they behave.

Always, the error message you quote below only describes the
last allocation event which failed.  It doesn't tell you what
the total that was successfully allocated in previous tries is.
So it's not just the first call for 62 Mb which fails.

Guess I've come to the end of whatever slight help I can offer.
Please do come back and tell us what the ultimate outcome on this
question turns out to be.  And, if you have had other off-list
responses during the day, you might summarize them in an email
back to the list so that the rest of us know that your question
is being dealt with appropriately.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Mon, 18 Aug 2003, Elliott Moreton wrote:

> When running glmmPQL(), I keep getting errors like
>
> 	Error: cannot allocate vector of size 61965 Kb
> 	Execution halted
>
> This is R-1.7.1.  The data set consists of about 11,000 binary responses
> from 16 subjects.  The model is
>
> 	fixed =
> 	SonResp ~ (ordered (Stop) + ordered (Son)) * StopResp,
>
> 	random =
> 	~ 1 + (ordered (Stop) + ordered (Son)) * StopResp | Subj
>
> 	family = binomial (link = logit)
>
> SonResp and StopResp are binary; Stop and Son are ordered factors with six
> levels each.
>
> The machine I'm running this on is my university's scientific server, a
> Beowulf Linux cluster; the machine this job would be running on would have
> two 1.4 GHz CPUS, a 2-gigabyte RAM, and an 18-gigabyte hard disk, plus 130
> gigabytes of scratch file space; it would be running Red Hat Linux 7.2
> with XFS.
>
> Can anyone tell me whether this is (1) a problem with the model (no
> machine could fit it in the lifetime of the universe), (2) a problem with
> how I formulated the model (there's a way to get the same end result
> without overflowing memory), (c) a problem with glmmPQL() (that could be
> fixed by using some other package), (d) a problem with the machine I'm
> running it on (need more real or virtual memory), or (e) other?
> (Naturally, I've contacted the system administrators to ask them the same
> thing, but I don't know how much they know about R.)
>
> Many thanks in advance,
> Elliott Moreton