[R-sig-ME] calculation max|grad value?
Ben Pelzer
b.pelzer at maw.ru.nl
Wed Apr 15 11:53:29 CEST 2015
Doug and Ben,
Thanks for your explanations! The milliseconds example is nice. Ben's
remark about
the expected
change in the deviance over the scale of one standard error of the
parameter,
is very useful to understand this "relative gradience" measure.
But this also raises a question. I do not understand how Ben's
interpretation follows from the formula
solve(hessian,gradient) or from solve(chol(hessian),gradient). This is
probably due to my rather limited knowledge of MLE theory.
I know that the inverse of the information matrix gives the variances of
the estimates. And that the inform. matrix equals minus the expected
value of the hessian. I also understand that solve(hessian, gradient)
results in the inverse(hessian) * gradient. In scalars, this looks like
"dividing the gradient by the hessian" or scaling the gradient, making
it dimensionless as Doug pointed out.
But to translate this to standard errors of the estimates, it looks as
if there should be a minus sign and also a square root, something like
-sqrt(solve(hessian)) * gradient? But maybe I'm wrong, as I said:
limited knowledge.
Thanks once again for your help and your effort to explain all this!
Ben Pelzer.
On 13-4-2015 22:11, Douglas Bates wrote:
> On Mon, Apr 13, 2015 at 7:45 AM Ben Pelzer <b.pelzer at maw.ru.nl
> <mailto:b.pelzer at maw.ru.nl>> wrote:
>
> Hi Dan,
>
> Thanks for pointing me to that formula. Am I right that in case of
> only
> one parameter (say a fixed intercept only), this measure means
> that the
> gradient of the intercept-estimate is divided by the (estimated)
> variance of the intercept? Could you explain to me the rationale
> behind
> this measure or to put it differently: why is it meaningful to use
> it as
> a criterion to evaluate the quality of the convergence?
>
> In case of more than one parameter, it's even less clear to me
> what the
> measure exactly expresses and thus why it is used by glmer.
>
> Thanks again for any help/explanation,
>
> Ben Pelzer.
>
>
> There is some confusion here about what constitutes a parameter. The
> actual optimization in lmer is of a profiled log-likelihood that is a
> function of variance component parameters only. In the case of a
> model with a scalar random effects term that parameter is the ratio of
> the standard deviation of the random effects to the residual standard
> deviation.
>
> See http://arxiv.org/abs/1406.5823 for more details.
>
> The rationale for normalizing the gradient with respect to the Hessian
> is to make it scale invariant. (It turns out that the normalization
> used wasn't the correct one to use but we will change that.) Consider
> the sleepstudy data in the lme4 package. The response time is
> measured in milliseconds, as is common in such experiments, and a
> typical response time at the beginning of the experiment is around 250
> ms. Suppose that the responses were converted to seconds, so the
> typical response time was 0.250 sec. The scaling would affect the
> size of the gradient terms the same way. If the measurements were
> converted to microseconds all the gradient terms would be multiplied
> by 1000.
>
> The scaling is to make the quantity being compared to a fixed
> tolerance dimensionless. It is a basic principle that only
> dimensionless quantities can be compared to a fixed tolerance.
>
>
>
>
> On 11-4-2015 1:51, Daniel McCloy wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > with a model called mod, you can get the relative gradient with
> >
> > relgrad <- with(mod at optinfo$derivs, solve(Hessian, gradient))
> > print(max(abs(relgrad))
> >
> > - -- dan
> >
> > Daniel McCloy
> > http://dan.mccloy.info/
> > Postdoctoral Research Fellow
> > Institute for Learning and Brain Sciences
> > University of Washington
> >
> >
> > On 04/10/2015 06:54 PM, Ben Pelzer wrote:
> >> Dear list,
> >>
> >> For a given model in glmer (lme4_1.1-7), I got the warning
> >> message:
> >>
> >> 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
> >> control$checkConv, :
> >>
> >> Model failed to converge with max|grad| = 0.0601483 (tol = 0.001,
> >> component 17)
> >>
> >> My model has 15 fixed effects and two (uncorrelated) random
> >> effects.
> >>
> >> There has been a lot of correspondence about convergence issues in
> >> the recent lme4 version(s) lately, but I cannot easily find what
> >> measure the "max|grad" is exactly pointing to. If I'm right, it is
> >> the "relative gradient" of one of the model parameters, apparantly
> >> parameter 17. But how exactly is this max|grad calculated? I found
> >> a command (coming from Ben Bolker):
> >>
> >> gg <- model7 at optinfo$derivs$grad
> >>
> >> which produces gradients that are much larger than 0.0601483,
> >> probably since they are "absolute" gradients.
> >>
> >> In the book of Schnabel et al. I found a definition of the relative
> >> gradient in their equation (7.2.3):
> >>
> >> Delta(f) * x / f
> >>
> >> which I believe must be now interpreted as
> >>
> >> gradient * parameters estimate by glmer / loglikelihood
> >>
> >>
> >> Is this indeed the formula that is used in lme4 to derive the
> >> max|grad and is my interpretation of it correct? (I would like to
> >> reproduce the max|grad value 0.0601483).
> >>
> >> And which of the parameters in my model is actually "component 17"
> >> (which the warning message refers to)?
> >>
> >> Thanks for any help!
> >>
> >> Ben Pelzer.
> >>
> >>
> >> *--------------------------.
> >>
> >> Below is part of the glmer output and also the result from "gg <-
> >> model7 at optinfo$derivs$grad"
> >>
> >> Generalized linear mixed model fit by maximum likelihood (Laplace
> >> Approximation) [glmerMod] Family: binomial ( logit ) Formula:
> >> bottom10readA ~ 1 + female2 + (-1 + female2 | Country33) + (1 |
> >> SCHOOLID2) + SES_mean_cen + age_cen + secondgen_mean + native_mean
> >> + Parliament2013_cen + WLMP_cen + HDI2012_cen + selage_cen + ce +
> >> ZSTAND2012C + Fselage2 + FCE2 + FZstand_pisa_cen2 Control:
> >> glmerControl(optimizer = "nloptwrap", optCtrl = list(algorithm =
> >> "NLOPT_LN_BOBYQA"))
> >>
> >> AIC BIC logLik deviance df.resid 151434.4 151613.4 -75700.2
> >> 151400.4 276524
> >>
> >> Scaled residuals: Min 1Q Median 3Q Max -4.6982
> >> -0.3104 -0.1819 -0.1126 10.6450
> >>
> >> Random effects: Groups Name Variance Std.Dev. SCHOOLID2
> >> (Intercept) 2.314767 1.52144 Country33 female2 0.008527
> >> 0.09234 Number of obs: 276541, groups: SCHOOLID2, 10643;
> >> Country33, 35
> >>
> >> Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept)
> >> -2.1629201 0.1006349 -21.493 < 2e-16 *** female2
> >> -0.4316766 0.0523024 -8.253 < 2e-16 *** SES_mean_cen
> >> -0.3901277 0.0257537 -15.148 < 2e-16 *** age_cen
> >> -0.1685527 0.0256951 -6.560 5.39e-11 *** secondgen_mean
> >> -0.2462713 0.1269396 -1.940 0.0524 . native_mean
> >> -1.0927106 0.0844515 -12.939 < 2e-16 *** Parliament2013_cen
> >> -0.0020840 0.0025656 -0.812 0.4166 WLMP_cen
> >> 0.0002831 0.0027028 0.105 0.9166 HDI2012_cen -0.0338573
> >> 0.0600986 -0.563 0.5732 selage_cen 0.0525462 0.0119847
> >> 4.384 1.16e-05 *** ce -0.0902947 0.0496913 -1.817
> >> 0.0692 . ZSTAND2012C -0.0457672 0.1760672 -0.260 0.7949
> >> Fselage2 -0.0092435 0.0096429 -0.959 0.3378 FCE2
> >> -0.0650998 0.0450328 -1.446 0.1483 FZstand_pisa_cen2
> >> -0.4586711 0.1497851 -3.062 0.0022 ** --- Signif. codes: 0
> >> ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >>
> >> And finally the 17 gradients:
> >>
> >> gg
> >>
> >> [1] -2.3293884 4.3723284 -5.6278026 0.2851749 1.6813773
> >> -8.3454128 [7] 4.1930703 -5.1109944 49.0449769 207.5065300
> >> 20.8115773 -31.4621360 [13] 14.0848733 -3.2661238 -24.9956165
> >> 7.0817152 -5.9149812
> >>
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org
> <mailto:R-sig-mixed-models at r-project.org> mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v2.0.22 (GNU/Linux)
> >
> > iQEcBAEBAgAGBQJVKGIQAAoJEEzlbMQqUVLO8rUH/Al0jlkQJorokVi1eKnmImHC
> > fFOojbA6HrFZdtqAooueNdc3RroXXhdPLtlhxLgaNye+aaE8dJoe0FBMb94IxJV9
> > SB8thJyjCfEnuQvLvFFLgkHJYaorjMn/6J1fKz/ci9Ggun8d0abdpkclVcsycDaT
> > 2BIBMT0qFcHMn8hzKz693xSz1Gfy9d7ggkkdOu0K0i4c/URP3XTjOVbO0Vyv2UIe
> > Xni7cuPJ9AMN6zzioGgZi3URd10ogOKljKOSLZTF1C8yBURc82w00/zOU8GzGPs8
> > pQ2SiQpLR9yxMeYZwGCPQ+gG3I3CBuBZxWJBPXFquJWpP5WQhcessyg6bV4YdQ4=
> > =W9ek
> > -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org
> <mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list