[R-sig-ME] Another predict/cloglog peculiarity --- lme4 this time (PS)

Sun Apr 5 12:52:51 CEST 2020

Apologies.  I see that my Mac mail software has interpreted 'make.link’
as a link that is underlined — somewhere between going out and being
received at the other end (at least at my end), it was then mangled in
the attempt to give a presumed underlying web address.  Hopefully,
the following should be OK.

The issue under discussion related to any calculation that works with
a cloglog link function, not just mixed models.

The ‘crude' way to calculate the inverse function, for eta=3.5, would be:

> cllinv <- function(eta) <- 1-exp(-exp(eta))
> 
Observe the following
> options(digits=17)
> 
> rlinkinv <- make.link('cloglog')$linkinv
> cllinv <- function(eta)1-exp(-exp(eta))
> 
> cll <- function(mu)log(1-log(mu))
> 
> mat <- matrix(nrow=6, ncol=3)
> colnames(mat)<- c('1-exp(1-exp(eta))','R linkinv()', 'Calc as 1 -')
> rownames(mat) <- paste(seq(from=3.5, to=4, by=0.1))
> i <- 0
> for(eta in seq(from=3.5, to=4, by=0.1)){
>   i <- i+1
>   mat[i,] <- c(cllinv(eta), rlinkinv(eta), exp(1-exp(eta)))
> }
> print(mat)

                 1-exp(1-exp(eta))                      R linkinv()                          Calc as 1 -
  3.5 0.99999999999999589 0.99999999999999589 1.1283307669739036e-14
  3.6 0.99999999999999989 0.99999999999999978 3.4664362337169868e-16
  3.7 1.00000000000000000 0.99999999999999978 7.3833488826673409e-18
  3.8 1.00000000000000000 0.99999999999999978 1.0490996011645160e-19
  3.9 1.00000000000000000 0.99999999999999978 9.5297924643308894e-22
  4    1.00000000000000000 0.99999999999999978 5.2798210162856661e-24

I’d judge that the maximum returned by rlinkinv = make.link('cloglog')$linkinv
is set to ensure that log(-log(1-mu)) with mu=1 never appears in the calculations,
allowing (but this should at least be documented, and perhaps generate a
warning):

> cll(rlinkinv(3.7))
  [1] 3.5847307979997631

It is important that predicted values on the scale of the linear predictor
are not thus constrained  — it can, then, be fatal to calculate them using 
rlinkinv(predict(obj,  type‘response’)) !

John Maindonald             email: john.maindonald using anu.edu.au

> On 5/04/2020, at 16:07, Rolf Turner <r.turner using auckland.ac.nz> wrote:
> 
> 
> On 3/04/20 8:00 pm, Thierry Onkelinx wrote:
> 
>> Dear Rolf,
>> This is due to the precision of floating point numbers. Let's have a look at what happens when we get the extreme negative difference.
>> > i <- which.min(g(predResp) - predLink)
>> > c(predResp[i], 1 - exp(-exp(predLink[i])))
>> 76 76
>>  1  1
>> > c(1 - predResp[i], exp(-exp(predLink[i])))
>>           76           76
>> 2.220446e-16 0.000000e+00
>> > c(-log(1 - predResp[i]), exp(predLink[i]))
>>         76         76
>>   36.04365 3602.72298
>> > c(log(-log(1 - predResp[i])), predLink[i])
>>       76       76
>> 3.584731 8.189445
> 
> Thanks Thierry.  I think it is at last becoming clear to me.  Sorry for taking so long to reply, but I've been thrashing around, trying to express things in my own words, in a way that I can understand.
> 
> My attempt to do so follows.  Most of you will probably find that what I have to say simply obfuscates the issue. You are probably best advised to just read what Thierry has written above and ignore my ramblings. But just in case anyone is interested, here are my thoughts:
> 
> Basically predResp is exactly ginv(predLink); no problems with numerical delicacy there.
> 
> So when I looked at
> 
>    g(predResp) - predLink
> 
> I was actually looking at
> 
>    g(ginv(predLink)) - predLink
> 
> The problem is that g(ginv(x)) is NOT equal to x, when x gets bigger than about 3.5, due to numerical delicacy.  (For x < 3.5, g(ginv(x)) is
> pretty close to x just as the young and naïve, such as my very good self --- :-) --- would expect.)
> 
> For x > 3.5, ginv(x) becomes numerically indistinguishable from 1, whence g(ginv(x)) is g(1) = Inf.  Or it *would* be except for the fact that ginv() (for the cloglog link) is designed so that its values are capped at 1 - .Machine$double.eps = 2.220446e-16 (on my machine at least).
> 
> Consequently g(ginv(x)) is capped at g(1 - 2.220446e-16) = 3.584731.
> So as x grows it quickly outstrips g(ginv(x)).
> 
> When x is close to 3.5 g(ginv(x)) is simply a bit numerically wobbly, and not necessarily smaller than x. (Which is why I got a negative value, explicitly -0.00281936, for the lower bound of the range of
> g(predResp) - predLink.
> 
> Another person, who replied to me off-list, suggested plotting
> g(predResp) ~ predLink .  This is indeed illuminating.
> 
> I think that the main thing to take away from all this is that numerical delicacy is always lurking in the bushes, and care should always be taken, especially when logs and exponentials are involved.  It is perilous to assume that *algebraic* identities, like g(ginv(x)) = x,
> will turn out to be *numerically* true.
> 
> Thanks again to Thierry.
> 
> cheers,
> 
> Rolf
> 
> -- 
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models