[R-sig-ME] calculation max|grad value?

Mon Apr 13 23:25:17 CEST 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15-04-13 08:41 AM, Ben Pelzer wrote:
> Hi Dan,
> 
> Thanks for pointing me to that formula. Am I right that in case of
> only one parameter (say a fixed intercept only), this measure means
> that the gradient of the intercept-estimate is divided by the
> (estimated) variance of the intercept? Could you explain to me the
> rationale behind this measure or to put it differently: why is it
> meaningful to use it as a criterion to evaluate the quality of the
> convergence?
> 
> In case of more than one parameter, it's even less clear to me what
> the measure exactly expresses and thus why it is used by glmer.
> 
> Thanks again for any help/explanation,
> 
> Ben Pelzer.

  As Doug Bates pointed out and explained, we should really be using
solve(chol(Hessian), gradient) instead (I started this e-mail this
morning and have had the compose window sitting open all day).
Heuristically, what this means is that we're estimating the expected
change in the deviance over the scale of one standard error of the
parameter, rather than over the scale of one unit of the parameter (as
Doug points out, the latter can be rather arbitrary).

  We know and have known for a while that the convergence criteria
we're currently using are rather dodgy, but we've been struggling with
what to replace them with.  We know there are lots of false positives
for large (say nobs > 10^5) data sets, and we are trying to come up
with reasonable, simple criteria that will reduce the number of false
positives without completely scrapping the convergence criteria.

  In the meantime, I would say that the gold standard (at this point)
for "is my fit really OK?" is  whether you can refit with several
different optimizers (and possibly different starting points?) and get
practically the same result in each case; see e.g.
https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html

> 
> 
> 
> On 11-4-2015 1:51, Daniel McCloy wrote: with a model called mod,
> you can get the relative gradient with
> 
> relgrad <- with(mod at optinfo$derivs, solve(Hessian, gradient)) 
> print(max(abs(relgrad))
> 
> -- dan
> 
> Daniel McCloy http://dan.mccloy.info/ Postdoctoral Research Fellow 
> Institute for Learning and Brain Sciences University of Washington
> 
> 
> On 04/10/2015 06:54 PM, Ben Pelzer wrote:
>>>> Dear list,
>>>> 
>>>> For a given model in glmer (lme4_1.1-7), I got the warning 
>>>> message:
>>>> 
>>>> 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = 
>>>> control$checkConv,  :
>>>> 
>>>> Model failed to converge with max|grad| = 0.0601483 (tol =
>>>> 0.001, component 17)
>>>> 
>>>> My model has 15 fixed effects and two (uncorrelated) random 
>>>> effects.
>>>> 
>>>> There has been a lot of correspondence about convergence
>>>> issues in the recent lme4 version(s) lately, but I cannot
>>>> easily find what measure the "max|grad" is exactly pointing
>>>> to.  If I'm right, it is the "relative gradient" of one of
>>>> the model parameters, apparantly parameter 17. But how
>>>> exactly is this max|grad calculated? I found a command
>>>> (coming from Ben Bolker):
>>>> 
>>>> gg <- model7 at optinfo$derivs$grad
>>>> 
>>>> which produces gradients that are much larger than
>>>> 0.0601483, probably since they are "absolute" gradients.
>>>> 
>>>> In the book of Schnabel et al. I found a definition of the
>>>> relative gradient in their equation (7.2.3):
>>>> 
>>>> Delta(f) * x  / f
>>>> 
>>>> which I believe must be now interpreted as
>>>> 
>>>> gradient * parameters estimate by glmer /  loglikelihood
>>>> 
>>>> 
>>>> Is this indeed the formula that is used in lme4 to derive
>>>> the max|grad and is my interpretation of it correct? (I would
>>>> like to reproduce the max|grad value 0.0601483).
>>>> 
>>>> And which of the parameters in my model is actually
>>>> "component 17" (which the warning message refers to)?
>>>> 
>>>> Thanks for any help!
>>>> 
>>>> Ben Pelzer.
>>>> 
>>>> 
>>>> *--------------------------.
>>>> 
>>>> Below is part of the glmer output and also the result from
>>>> "gg <- model7 at optinfo$derivs$grad"
>>>> 
>>>> Generalized linear mixed model fit by maximum likelihood
>>>> (Laplace Approximation) [glmerMod] Family: binomial  ( logit
>>>> ) Formula: bottom10readA ~ 1 + female2 + (-1 + female2 |
>>>> Country33) + (1 | SCHOOLID2) + SES_mean_cen + age_cen +
>>>> secondgen_mean + native_mean + Parliament2013_cen + WLMP_cen
>>>> + HDI2012_cen + selage_cen + ce + ZSTAND2012C + Fselage2 +
>>>> FCE2 + FZstand_pisa_cen2 Control: glmerControl(optimizer =
>>>> "nloptwrap", optCtrl = list(algorithm = "NLOPT_LN_BOBYQA"))
>>>> 
>>>> AIC      BIC   logLik deviance df.resid 151434.4 151613.4
>>>> -75700.2 151400.4   276524
>>>> 
>>>> Scaled residuals: Min      1Q  Median      3Q     Max
>>>> -4.6982 -0.3104 -0.1819 -0.1126 10.6450
>>>> 
>>>> Random effects: Groups    Name        Variance Std.Dev.
>>>> SCHOOLID2 (Intercept) 2.314767 1.52144 Country33 female2
>>>> 0.008527 0.09234 Number of obs: 276541, groups:  SCHOOLID2,
>>>> 10643; Country33, 35
>>>> 
>>>> Fixed effects: Estimate Std. Error z value Pr(>|z|)
>>>> (Intercept) -2.1629201  0.1006349 -21.493  < 2e-16 ***
>>>> female2 -0.4316766  0.0523024  -8.253  < 2e-16 ***
>>>> SES_mean_cen -0.3901277  0.0257537 -15.148  < 2e-16 ***
>>>> age_cen -0.1685527  0.0256951  -6.560 5.39e-11 ***
>>>> secondgen_mean -0.2462713  0.1269396  -1.940   0.0524 .
>>>> native_mean -1.0927106  0.0844515 -12.939  < 2e-16 ***
>>>> Parliament2013_cen -0.0020840  0.0025656  -0.812   0.4166
>>>> WLMP_cen 0.0002831  0.0027028   0.105   0.9166 HDI2012_cen
>>>> -0.0338573 0.0600986  -0.563   0.5732 selage_cen
>>>> 0.0525462  0.0119847 4.384 1.16e-05 *** ce
>>>> -0.0902947  0.0496913  -1.817 0.0692 . ZSTAND2012C
>>>> -0.0457672  0.1760672  -0.260   0.7949 Fselage2
>>>> -0.0092435  0.0096429  -0.959   0.3378 FCE2 -0.0650998
>>>> 0.0450328  -1.446   0.1483 FZstand_pisa_cen2 -0.4586711
>>>> 0.1497851  -3.062   0.0022 ** --- Signif. codes:  0 ‘***’
>>>> 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>> 
>>>> 
>>>> And finally the 17 gradients:
>>>> 
>>>> gg
>>>> 
>>>> [1]  -2.3293884   4.3723284  -5.6278026   0.2851749
>>>> 1.6813773 -8.3454128 [7]   4.1930703  -5.1109944  49.0449769
>>>> 207.5065300 20.8115773 -31.4621360 [13]  14.0848733
>>>> -3.2661238 -24.9956165 7.0817152 -5.9149812
>>>> 
>>>> 
>>>> 
>>>> 
>>>> [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________ 
>>>> R-sig-mixed-models at r-project.org  mailing list 
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> 
> 
> _______________________________________________ 
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVLDQ9AAoJEOCV5YRblxUH4E8IALrcXjAHXP9fmg77LwqYH9II
DaLh3QaQLcp3D2RB74U6zfHgXh9lsrO6ojGIPQDSa6k21L+Yfa4EBmmQs77hkzEz
+yBYJdgWCJl5YL0J3d3Rckbtl2ieYxO046E764r9X4m2w8kIz+wHT51SHma7S9ho
Ghw6GbFTaen9X5x9pPIC1otA3CVXHDtErZwYKi522ACF2hnvd1Z7GjkMAZZ6V868
fvBmXIx3BfgrGhYvbQpNKcxu1iQ2AbfHtq9JgjUA60jmnQIvbHDHOyqiNP0ksiJO
OFIt7x9n7RR0jEOPL4sTJAeUitvagw6eTFy4OqjHTHz4sysX8z9b9OJJ5FeOv88=
=A25k
-----END PGP SIGNATURE-----