[R-sig-ME] Ben's Point about Centering and GLMM (was: Re: Low intercept estimate in a binomial glmm

Thu Apr 11 21:23:13 CEST 2013

A quick response, and any further discussion had better proceed
in a more appropriate place (though this affects mixed models as
much as linear models): 

For the nihills data in the DAAG package:

nihills$gradient <- with(nihills , climb/dist)
lognihills <- log( nihills )
names(lognihills) <- paste("l", names(nihills), sep="")

lognihills.lm <- lm(ltime ~ ldist + lclimb, data=lognihills) 
round(coef( lognihills.lm) ,3)
(Intercept)       ldist      lclimb 
    -4.961       0.681       0.466 

lognigrad.lm <- lm(ltime ~ ldist + lgradient , data=lognihills) 
round(coef( lognigrad.lm) ,3)
(Intercept)       ldist   lgradient 
    -4.961       1.147       0.466 

I have no interest in how the time may change with distance when
climb is held constant (as races get longer the gradient reduces, and 
hence the somewhat counter-intuitive coefficient of 0.681 for list)

The coefficient of 1.147 for ldist when gradient is held constant does
make sense -- as races get longer the relative rate of increase of
time with distance increases.

Sure, the predicted values and SEs of predicted values are the 
same in the two cases, if one translates from one space to the other.
That does not affect my point. 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 12/04/2013, at 1:42 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:

> On Fri, Apr 5, 2013 at 1:24 AM, John Maindonald
> <john.maindonald at anu.edu.au> wrote:
>> Surely it is an issue of how you define multi-collinearity.
>> 
> I don't think so. The definition is the same, but multi-collinearity's
> effect is different for every point in the X space.  I mean, the
> elevation in variance estimates due to multi-collinearity depends on
> where you place the y axis. The point estimates that appear in
> regression output are different when you center because you move the y
> axis about by centering.  But if you fit in one spot, and then project
> the answer over to the other spot, the answer you get about slope,
> standard error, etc is all the same. In either model.
> 
> Centering appeals to many practitioners because it seems to give
> parameters with smaller standard errors, but its an illusion.
> Uncertainty about predictions is hour-glass shaped in the X space, and
> if you go into the middle, you have less uncertainty.
> 
>> Re-parameterisation may however give
>> parameters that are much more interpretable, with much
>> reduced correlations and standard errors   That is the
>> primary reason, if there is one, for doing it.
>> 
> 
> I think that's a mistake, and have the examples in the rockchalk
> vignette to demonstrate it.  If you say "what is the slope when
> observed X = x", and "what is the uncertainty of your estimate when X
> = x?" all of these models give exactly the same answer.
> 
> But back to Ben's point about GLMM.  That's an eye opener.
> 
> I'd like to make a working example of the problem that centering
> affects estimates (beyond rounding error).  I need to know a test case
> that is likely to produce the effect mentioned before I can go any
> further.
> 
> pj
> --
> Paul E. Johnson
> Professor, Political Science      Assoc. Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org               http://quant.ku.edu