[R-sig-ME] Mixed-models and condition number

Thu Feb 5 09:12:49 CET 2009

Dear Stephan,

thank your very much for your response and the detailed list of 
literature. I knew about Belsley (1991b) and used it on the design 
matrix of the fixed-effects. My (absolutely empirical) results were the 
following:

on  a small data set of 58 values and the mixed-effects model:
mymodel=lme(log(calcium) ~ soil.horizon+flow.region+content.of.silt, 
data=mydata, random=~1|plot)
with soil.horizon and flow.region: factors
content.of.silt: continuous covariate

1. mean-centering the continuous covariate decreased the collinearity 
between the intercept term and the continuous covariate 
(summary.lme$corFixed and Belsley 1991b on the design matrix of 
fixed-effects) and decreased kappa (of the design matrix of 
fixed-effects) by factor 12.
2. scaling the covariate to obtain fixed-effects estimates of comparable 
size decreased kappa by factor 4, but had no effect on correlation of 
the fixed-effects.
3. I compared kappas of the mixed-effects design matrix and as proposed 
by Douglas Bates of the "triangular matrix derived from the 
fixed-effects model matrix after removing the random effects" in lme4: 
the influence of mean-centering and scaling on kappa was comparable and 
values of kappas of the triangular matrix and the design matrix of the 
fixed-effects differed little for mean-centered and scaled model, but 
largely for the non-scaled and non-centered one.

I will try to find a mathematician at my university who would like to 
play around with mixed-models  ;-).

Thanks again

Christina

Stephan Kolassa schrieb:
> Hi Christina,
> let me start by saying that I don't know of anyone looking at
> conditioning of design matrices in a mixed model environment. Might be a
> nice topic to have an M. Sc. student play around with empirically. The 
> problem with ill-conditioning in fixed-effects models basically comes 
> down to high variances in the parameter estimates, so one could 
> actually build a mixed model with an ill-conditioned design matrix and 
> play around with small changes to simulated observations, checking 
> whether inferences or estimates exhibit "large" variance.
>
> If you find out anything about this, would you let me know?
>
> That said, my recent interest has been in collinearity between
> predictors, which is not exactly conditioning, but reasonably close to
> it. I'd recommend you look at Hill & Adkins (2001) and the collinearity
> diagnostics they recommend. Belsley (1991a) wrote an entire monograph
> about them, but there are also shorter introductions, e.g., Belsley 
> (1991b).
>
> Scaling the columns of X to equal euclidean length (usually to length 1)
> before diagnosing collinearity appears to be accepted procedure, so I 
> think scaling would be a good starting point in the mixed model, too. 
> However, there is a discussion as to whether to first remove the
> constant column from X and subtract the column mean from each of the 
> remaining columns.
>
> Marquardt (1980) claims that centering removes "nonessential ill
> conditioning." Weisberg (1980) and Montgomery and Peck (1982) also 
> advocate centering.
>
> Other practitioners maintain that centering removes meaningful
> information from X, such as collinearity with the constant column, and 
> should not be used (Belsey et al., 1980; Belsley, 1984a, 1984b, 1986, 
> 1991a, 1991b; Echambadi & Hess, 2007; Hill & Adkins, 2001). For 
> example, Simon and Lesage (1988) found that collinearity with the 
> constant
> column introduces numerical instability, which is mitigated but not 
> prevented by employing collinearity diagnostics after centering X. In 
> addition, these problems are not confined to the constant coefficient, 
> but extend to all estimates.
>
> For a very lively debate on this topic see Belsley (1984a); Cook (1984);
> Gunst (1984); Snee and Marquardt (1984); Wood (1984); Belsley (1984b). 
> The consensus seems to be that centering cannot be once and for all be 
> advised or rejected; rather, whether or not to center data depends on 
> the problem one is facing.
>
> HTH,
> Stephan
>
>
> * Belsey, D. A., Kuh, E., & Welsch, R. E. (1980). Regression 
> Diagnostics: Identifying Influential Data and Sources of Collinearity. 
> New York, NY: John Wiley & Sons.
>
> * Belsley, D. A. (1984a, May). Demeaning Conditioning Diagnostics 
> through Centering. The American Statistician, 38(2), 73-77.
>
> * Belsley, D. A. (1984b, May). Demeaning Conditioning Diagnostics 
> through Centering: Reply. The American Statistician, 38(2), 90-93.
>
> * Belsley, D. A. (1986). Centering, the constant, first-differencing, 
> and assessing conditioning. In E. Kuh & D. A. Belsley (Eds.), Model 
> Reliability (p. 117-153). Cambridge: MIT Press.
>
> * Belsley, D. A. (1987). Collinearity and Least Squares Regression: 
> Comment -- Well-Conditioned Collinearity Indices. Statistical Science, 
> 2(1), 86-91. Available from http://projecteuclid.org/euclid.ss/1177013441
>
> * Belsley, D. A. (1991a). Conditioning Diagnostics: Collinearity and 
> Weak Data in Regression. New York, NY: Wiley.
>
> * Belsley, D. A. (1991b, February). A Guide to using the collinearity 
> diagnostics. Computational Economics, 4(1), 33-50. Available from 
> http://www.springerlink.com/content/v135h6631x412kk8/
>
> * Cook, R. D. (1984, May). Demeaning Conditioning Diagnostics through 
> Centering: Comment. The American Statistician, 38(2), 78-79.
>
> * Echambadi, R., & Hess, J. D. (2007, May-June). Mean-Centering Does 
> Not Alleviate Collinearity Problems in Moderated Multiple Regression 
> Models. Marketing Science, 26(3), 438-445.
>
> * Golub, G. H., & Van Loan, C. F. (1996). Matrix Computations (3rd 
> ed.). Baltimore: Johns Hopkins University Press.
>
> * Gunst, R. F. (1984, May). Comment: Toward a Balanced Assessment of 
> Collinearity Diagnostics. The American Statistician, 38(2), 79-82.
>
> * Hill, R. C., & Adkins, L. C. (2001). Collinearity. In B. H. Baltagi 
> (Ed.), A Companion to Theoretical Econometrics (p. 256-278). Oxford: 
> Blackwell.
>
> * Marquardt, D. W. (1987). Collinearity and Least Squares Regression: 
> Comment. Statistical Science, 2(1), 84-85. Available from 
> http://projecteuclid.org/euclid.ss/1177013440
>
> * Montgomery, D. C., & Peck, E. A. (1982). Introduction to Linear 
> Regression Analysis. New York, NY: John Wiley.
>
> * Simon, S. D., & Lesage, J. P. (1988, January). The impact of 
> collinearity involving the intercept term on the numerical accuracy of 
> regression. Computational Economics (formerly Computer Science in 
> Economics and Management), 1(2), 137-152.
>
> * Snee, R. D., & Marquardt, D. W. (1984, May). Comment: Collinearity 
> Diagnostics Depend on the Domain of Prediction, the Model, and the 
> Data. The American Statistician, 38(2), 83-87.
>
> * Weisberg, S. (1980). Applied Linear Regression. New York, NY: John 
> Wiley.
>
> * Wood, F. S. (1984, May). Comment: Effect of Centering on 
> Collinearity and Interpretation of the Constant. The American 
> Statistician, 38(2), 88-90.
>
>
>
> Christina Bogner schrieb:
>> Dear list members,
>>
>> I'm working with both nlme and lme4 packages trying to fit linear 
>> mixed-models to soil chemical and physical data. I know that for
>> linear models one can calculate the condition number kappa of the
>> model matrix to know whether the problem is well- or ill-conditioned.
>> Does it make any sense to compute kappa on the design matrix of the
>> fixed-effects in nlme or lme4? For comparison I fitted a simple
>> linear model to my data and scaling some numerical predictors
>> decreased kappa considerably. So I wonder if scaling them in the
>> mixed-model has any advantages?
>>
>> Thanks a lot for your help.
>>
>> Christina Bogner
>>
>