[R-sig-ME] calculation max|grad value?

Mon Apr 13 22:11:45 CEST 2015

On Mon, Apr 13, 2015 at 7:45 AM Ben Pelzer <b.pelzer at maw.ru.nl> wrote:

> Hi Dan,
>
> Thanks for pointing me to that formula. Am I right that in case of only
> one parameter (say a fixed intercept only), this measure means that the
> gradient of the intercept-estimate is divided by the (estimated)
> variance of the intercept? Could you explain to me the rationale behind
> this measure or to put it differently: why is it meaningful to use it as
> a criterion to evaluate the quality of the convergence?
>
> In case of more than one parameter, it's even less clear to me what the
> measure exactly expresses and thus why it is used by glmer.
>
> Thanks again for any help/explanation,
>
> Ben Pelzer.
>

There is some confusion here about what constitutes a parameter.  The
actual optimization in lmer is of a profiled log-likelihood that is a
function of variance component parameters only.  In the case of a model
with a scalar random effects term that parameter is the ratio of the
standard deviation of the random effects to the residual standard deviation.

See http://arxiv.org/abs/1406.5823 for more details.

The rationale for normalizing the gradient with respect to the Hessian is
to make it scale invariant.  (It turns out that the normalization used
wasn't the correct one to use but we will change that.)  Consider the
sleepstudy data in the lme4 package.  The response time is measured in
milliseconds, as is common in such experiments, and a typical response time
at the beginning of the experiment is around 250 ms.  Suppose that the
responses were converted to seconds, so the typical response time was 0.250
sec.  The scaling would affect the size of the gradient terms the same
way.  If the measurements were converted to microseconds all the gradient
terms would be multiplied by 1000.

The scaling is to make the quantity being compared to a fixed tolerance
dimensionless.  It is a basic principle that only dimensionless quantities
can be compared to a fixed tolerance.

>
>
>
> On 11-4-2015 1:51, Daniel McCloy wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > with a model called mod, you can get the relative gradient with
> >
> > relgrad <- with(mod at optinfo$derivs, solve(Hessian, gradient))
> > print(max(abs(relgrad))
> >
> > - -- dan
> >
> > Daniel McCloy
> > http://dan.mccloy.info/
> > Postdoctoral Research Fellow
> > Institute for Learning and Brain Sciences
> > University of Washington
> >
> >
> > On 04/10/2015 06:54 PM, Ben Pelzer wrote:
> >> Dear list,
> >>
> >> For a given model in glmer (lme4_1.1-7), I got the warning
> >> message:
> >>
> >> 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
> >> control$checkConv,  :
> >>
> >> Model failed to converge with max|grad| = 0.0601483 (tol = 0.001,
> >> component 17)
> >>
> >> My model has 15 fixed effects and two (uncorrelated) random
> >> effects.
> >>
> >> There has been a lot of correspondence about convergence issues in
> >> the recent lme4 version(s) lately, but I cannot easily find what
> >> measure the "max|grad" is exactly pointing to.  If I'm right, it is
> >> the "relative gradient" of one of the model parameters, apparantly
> >> parameter 17. But how exactly is this max|grad calculated? I found
> >> a command (coming from Ben Bolker):
> >>
> >> gg <- model7 at optinfo$derivs$grad
> >>
> >> which produces gradients that are much larger than 0.0601483,
> >> probably since they are "absolute" gradients.
> >>
> >> In the book of Schnabel et al. I found a definition of the relative
> >>   gradient in their equation (7.2.3):
> >>
> >> Delta(f) * x  / f
> >>
> >> which I believe must be now interpreted as
> >>
> >> gradient * parameters estimate by glmer /  loglikelihood
> >>
> >>
> >> Is this indeed the formula that is used in lme4 to derive the
> >> max|grad and is my interpretation of it correct? (I would like to
> >> reproduce the max|grad value 0.0601483).
> >>
> >> And which of the parameters in my model is actually "component 17"
> >>   (which the warning message refers to)?
> >>
> >> Thanks for any help!
> >>
> >> Ben Pelzer.
> >>
> >>
> >> *--------------------------.
> >>
> >> Below is part of the glmer output and also the result from "gg <-
> >> model7 at optinfo$derivs$grad"
> >>
> >> Generalized linear mixed model fit by maximum likelihood (Laplace
> >> Approximation) [glmerMod] Family: binomial  ( logit ) Formula:
> >> bottom10readA ~ 1 + female2 + (-1 + female2 | Country33) + (1 |
> >> SCHOOLID2) + SES_mean_cen + age_cen + secondgen_mean + native_mean
> >> + Parliament2013_cen + WLMP_cen + HDI2012_cen + selage_cen + ce +
> >> ZSTAND2012C + Fselage2 + FCE2 + FZstand_pisa_cen2 Control:
> >> glmerControl(optimizer = "nloptwrap", optCtrl = list(algorithm =
> >> "NLOPT_LN_BOBYQA"))
> >>
> >> AIC      BIC   logLik deviance df.resid 151434.4 151613.4 -75700.2
> >> 151400.4   276524
> >>
> >> Scaled residuals: Min      1Q  Median      3Q     Max -4.6982
> >> -0.3104 -0.1819 -0.1126 10.6450
> >>
> >> Random effects: Groups    Name        Variance Std.Dev. SCHOOLID2
> >> (Intercept) 2.314767 1.52144 Country33 female2     0.008527
> >> 0.09234 Number of obs: 276541, groups:  SCHOOLID2, 10643;
> >> Country33, 35
> >>
> >> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> -2.1629201  0.1006349 -21.493  < 2e-16 *** female2
> >> -0.4316766  0.0523024  -8.253  < 2e-16 *** SES_mean_cen
> >> -0.3901277  0.0257537 -15.148  < 2e-16 *** age_cen
> >> -0.1685527  0.0256951  -6.560 5.39e-11 *** secondgen_mean
> >> -0.2462713  0.1269396  -1.940   0.0524 . native_mean
> >> -1.0927106  0.0844515 -12.939  < 2e-16 *** Parliament2013_cen
> >> -0.0020840  0.0025656  -0.812   0.4166 WLMP_cen
> >> 0.0002831  0.0027028   0.105   0.9166 HDI2012_cen        -0.0338573
> >> 0.0600986  -0.563   0.5732 selage_cen          0.0525462  0.0119847
> >> 4.384 1.16e-05 *** ce                 -0.0902947  0.0496913  -1.817
> >> 0.0692 . ZSTAND2012C        -0.0457672  0.1760672  -0.260   0.7949
> >> Fselage2           -0.0092435  0.0096429  -0.959   0.3378 FCE2
> >> -0.0650998  0.0450328  -1.446   0.1483 FZstand_pisa_cen2
> >> -0.4586711  0.1497851  -3.062   0.0022 ** --- Signif. codes:  0
> >> ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >>
> >> And finally the 17 gradients:
> >>
> >> gg
> >>
> >> [1]  -2.3293884   4.3723284  -5.6278026   0.2851749 1.6813773
> >> -8.3454128 [7]   4.1930703  -5.1109944  49.0449769 207.5065300
> >> 20.8115773 -31.4621360 [13]  14.0848733  -3.2661238 -24.9956165
> >> 7.0817152 -5.9149812
> >>
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org  mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v2.0.22 (GNU/Linux)
> >
> > iQEcBAEBAgAGBQJVKGIQAAoJEEzlbMQqUVLO8rUH/Al0jlkQJorokVi1eKnmImHC
> > fFOojbA6HrFZdtqAooueNdc3RroXXhdPLtlhxLgaNye+aaE8dJoe0FBMb94IxJV9
> > SB8thJyjCfEnuQvLvFFLgkHJYaorjMn/6J1fKz/ci9Ggun8d0abdpkclVcsycDaT
> > 2BIBMT0qFcHMn8hzKz693xSz1Gfy9d7ggkkdOu0K0i4c/URP3XTjOVbO0Vyv2UIe
> > Xni7cuPJ9AMN6zzioGgZi3URd10ogOKljKOSLZTF1C8yBURc82w00/zOU8GzGPs8
> > pQ2SiQpLR9yxMeYZwGCPQ+gG3I3CBuBZxWJBPXFquJWpP5WQhcessyg6bV4YdQ4=
> > =W9ek
> > -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]