[R-sig-ME] Side effects from mcmcsamp()

Douglas Bates bates at stat.wisc.edu
Sat May 16 00:22:24 CEST 2009


On Fri, May 15, 2009 at 4:52 PM, Ben Bolker <bolker at ufl.edu> wrote:
> Juan Pedro Steibel wrote:
>> Hello,
>> I noticed this a wile ago. One thing that occurred to me was to copy to
>> another object before running mcmcsamp, example:
>> obmix0<-obmix
>> where obmix is the lmer object.
>> then do: mcmcsamp(obmix)....
>> For some reason obmix0 gets modified too.
>
>  Yeah, this aspect of the whole thing mystified me.
> I understand that mcmcsamp does nasty things internally,
> but I don't understand why it would also do them to a *copy* of the
> object.  I poked around in R manuals (language definitions etc.) but
> didn't manage to find anything useful for understanding this.  I can
> imagine there's some slick optimization where R says to itself "gee,
> this object hasn't been modified from its original yet, so I won't
> actually make a new copy ..." but is fooled by the internal
> manipulation.  ("Don't anthromorphize computers -- they get really mad"
> :-) )
>
>> gc()
>         used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 209832  5.7     467875 12.5   350000  9.4
> Vcells 476846  3.7    1165368  8.9  1028740  7.9
>
> ## make a big object -- 70 Mb
>> z <- runif(1e7)
>> gc()
>           used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   209821  5.7     467875 12.5   350000  9.4
> Vcells 10476828 80.0   11888118 90.7 10477151 80.0
>
> ## make a copy
>> z2 <- z
>
> ## Vcells only increases a little
>> gc()
>           used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   209826  5.7     467875 12.5   350000  9.4
> Vcells 10476829 80.0   12562523 95.9 10477151 80.0
>
> ## ditto
>> z3 <- z2
>> gc()
>           used (Mb) gc trigger  (Mb) max used (Mb)
> Ncells   209831  5.7     467875  12.5   350000  9.4
> Vcells 10476830 80.0   13270649 101.3 10477151 80.0
>
> ## tweak z3
>> z3[1] <- z3[1]+1
>> gc()
>           used  (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells   209838   5.7     467875  12.5   350000   9.4
> Vcells 20476831 156.3   22913123 174.9 20476845 156.3
>
>
> ## now memory usage increases
>
>  I'm sure this is obvious to hardened R programmers, but I can't find
> any explicit discussion in the R manuals.
>
>  So I would imagine that making some trivial modification of the object
> (like  "class(object) <- class(object)", for example)  would mark it as
> a "new" object ...

Exactly.  There is some deep magic in the internal structure of an R
object that allows for "copy on modify" semantics.  Strictly speaking
in a functional language the arguments to a function (including a
replacement function) are copied and you never touch the original
objects that are passed as arguments.  Of course if you did this in
practice the interpreter would slow to a crawl because it would be
spending all its time copying objects then discarding these copies.
Remember that in R, "everything is a function" so evaluating 2+2 is
actually "+"(2,2).

To maintain some semblance of sanity there is a special tag in an R
object that indicates if it is named (i.e. has been assigned to a
name) and changes to such objects require that they be copied - unless
you change the object within a .Call.  The .Call semantics are very
powerful and give you enough rope to hang yourself.   Inside C code
called through .Call you get access to the original object and can do
whatever you want to it.

The C code in mcmcsamp manipulates the structure of the fitted model
to produce the sample but it doesn't copy the object.  I had thought
that I had restored the fitted model object passed to mcmcsamp to its
original form before returning from mcmcsamp but obviously I didn't.
I'll fix it next week.  (Right now I am staring at a stack of over 150
exam booklets to be graded and wondering if I should consider another
career.)

What is happening here is that assigning the fitted model to a new
name doesn't really copy the object.  It just flags it as needing to
be copied if either it or the original are changed and passing it to
mcmcsamp doesn't count as changing the object, even though it does
change the object.

So why don't I copy it?  Well, these objects can be very big and
people get distressed if they can fit a model but run out of memory
trying to do something with it.  I have already had complaints that it
is possible to fit a model but not evaluate a "summary" of the object
or even "print" (which causes a summary).  That's the tradeoff.
>  cheers
>     Ben
>>
>> It's been a while since I checked this so it may have been fixed.
>>
>> Anyways, a way around this is to extract all the info needed from your
>> lmer object to other variables before running mcmcsamp, that's what I do
>> now and it works fine.
>> Thanks
>> JP
>>
>>
>>
>> John Maindonald wrote:
>>> Dear Douglas -
>>> The following is at the very least a serious trap!  The output
>>> from VarCorr() and print() changes remarkably after running
>>> mcmcsamp() with the lmer object as argument.  The estimate
>>> of sigma is always (for these data) high, but roughly in the
>>> middle of the range of values that come out of ant111b.samp at sigma
>>>
>>> I have also been able to reproduce this behaviour
>>> (a) under lme4_0.999375-28 [below, I use 30]
>>> (b) under Mac OSX.  In fact, I thought initially that this
>>> was an OSX problem!
>>>
>>> While mcmcsamp() is in view, it would be very helpful
>>> to have a progress report on where it is at.
>>>
>>> Many thanks
>>> John Maindonald.
>>>
>>> I run the following code:
>>>
>>> library(DAAG)
>>> library(lme4)
>>> ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
>>> ant111b.lmer
>>> VarCorr(ant111b.lmer)
>>> ant111b.samp <- mcmcsamp(ant111b.lmer, n=1000)
>>>
>>> ## Now, extract the VarCorr and summary correlations
>>> VarCorr(ant111b.lmer)
>>> summary(ant111b.lmer)
>>>
>>>> ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
>>>> ant111b.lmer
>>> Linear mixed model fit by REML
>>> Formula: harvwt ~ 1 + (1 | site)
>>>   Data: ant111b
>>>   AIC   BIC logLik deviance REMLdev
>>> 100.4 104.8 -47.21    95.08   94.42
>>> Random effects:
>>> Groups   Name        Variance Std.Dev.
>>> site     (Intercept) 2.36773  1.53874
>>> Residual             0.57754  0.75996
>>> Number of obs: 32, groups: site, 8
>>>
>>> Fixed effects:
>>>            Estimate Std. Error t value
>>> (Intercept)   4.2917     0.5603   7.659
>>>> library(DAAG)
>>>> library(lme4)
>>>> ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
>>>> ant111b.lmer
>>> Linear mixed model fit by REML
>>> Formula: harvwt ~ 1 + (1 | site)
>>>   Data: ant111b
>>>   AIC   BIC logLik deviance REMLdev
>>> 100.4 104.8 -47.21    95.08   94.42
>>> Random effects:
>>> Groups   Name        Variance Std.Dev.
>>> site     (Intercept) 2.36773  1.53874
>>> Residual             0.57754  0.75996
>>> Number of obs: 32, groups: site, 8
>>>
>>> Fixed effects:
>>>            Estimate Std. Error t value
>>> (Intercept)   4.2917     0.5603   7.659
>>>> ant111b.samp <- mcmcsamp(ant111b.lmer, n=1000)
>>>> VarCorr(ant111b.lmer)
>>> $site
>>>            (Intercept)
>>> (Intercept)    8.363436
>>> attr(,"stddev")
>>> (Intercept)
>>>   2.891961
>>> attr(,"correlation")
>>>            (Intercept)
>>> (Intercept)           1
>>>
>>> attr(,"sc")
>>> sigmaREML
>>> 1.428289
>>>> summary(ant111b.lmer)
>>> Linear mixed model fit by REML
>>> Formula: harvwt ~ 1 + (1 | site)
>>>   Data: ant111b
>>>   AIC   BIC logLik deviance REMLdev
>>> 112.1 116.5 -53.04    105.7   106.1
>>> Random effects:
>>> Groups   Name        Variance Std.Dev.
>>> site     (Intercept) 8.3634   2.8920
>>> Residual             2.0400   1.4283
>>> Number of obs: 32, groups: site, 8
>>>
>>> Fixed effects:
>>>            Estimate Std. Error t value
>>> (Intercept)   4.2917     0.4171   10.29
>>>
>>>> sessionInfo()
>>> R version 2.9.0 (2009-04-17)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>> States.1252;LC_MONETARY=English_United
>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] lme4_0.999375-30   Matrix_0.999375-26 lattice_0.17-22
>>> [4] DAAG_0.99-1        MASS_7.2-47
>>>
>>> loaded via a namespace (and not attached):
>>> [1] grid_2.9.0
>>>
>>>
>>>
>>> John Maindonald             email: john.maindonald at anu.edu.au
>>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>
>
>
> --
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker at ufl.edu / www.zoology.ufl.edu/bolker
> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>




More information about the R-sig-mixed-models mailing list