[R-sig-ME] Question about what is "shrinkage"...

Fri Sep 20 16:37:47 CEST 2013

Hi Emmanuel and others,

I would think of shrinkage as a characteristic result of what a lot of estimation methods do, rather than as a method in itself. The Efron and Morris paper focused on showing that this characteristic can be good, and they discuss this in light of some popular methods used then (they wrote several influential and more technical papers on empirical Bayes during this period). A lot of mixed/multilevel folks discuss different methods, but the value of this characteristic is well illustrated in Bates' last email and the "borrowing strength" phrase. So, yes, the values in caterpillar plots and the like (conditional modes, though often called level 2 residuals) are estimates of the individual unit which borrow information from the other units, so are "shrunken". 

One way to show the effect of "shrinking" is show a plot with all the individual regression lines (just two variables) and then show the lines with with the slope and intercept estimated with shrinkage. An example is comparing Figures 3.7 and 3.8 of Kreft and de Leeuw's Introducing Multilevel Modeling. Here is some code to make an example. The plot on the left shows the OLS estimates, and there are two level 2 units which are very different from the others. The shrunken estimates on the right borrow information from the other 8 and mean the slopes of these two, while still different from the others, are a little less different.

set.seed(818)
library(lme4)
lev2 <- rep(1:10,10)
x <- rnorm(100)
y <- rep(rnorm(10),10) + (rep(rnorm(10),10)+2)*x + rnorm(100)
par(mfrow=c(1,2))
plot(x,y,cex=.5)
for (i in 1:10)
  abline(lm(y[lev2==i]~x[lev2==i]))
plot(x,y,cex=.5)
m1 <- coef(lmer(y~x+(x|lev2)))$lev2
for (i in 1:10)
abline(m1[i,1],m1[i,2])

Dan

-----Original Message-----
From: Emmanuel Curis [mailto:emmanuel.curis at parisdescartes.fr] 
Sent: Thursday, September 19, 2013 10:41 AM
To: Daniel Wright; bates at stat.wisc.edu
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question about what is "shrinkage"...

Hello,

Thanks Daniel for the link on the clear article (despite I indeed do not know anything about baseball) and Douglas for the detailed answer. Quite interestingly, the article is more on the side of the estimator and Douglas' answer on the side of the reduced variance, at least as I understand it, but I think I begin to understand the link between the two.

But there are still a few questions I have, some of them philosophical...

When reading the paper, the two examples correspond to setups that could be handled by random-effect models (the baseball player or the town). In fact, in the end of the paper, individual mean values coming from a random variable is mentionned.

Does it mean that individual means obtained by random effect models as used in lmer, for instance, are themselves a kind of shrinkage estimator --- that is, already corrected by a shrinkage factor, but not given by a formula similar to the one cited in the paper? I know that random effects themselves are not (conditionnal) means, but modes, but when added to the fixed effects parts, corresponding to the mean (at least in linear models), aren't they comparable to (shrinked) means?

Would it be, in this case, an argument for prefering random effects over fixed effects when the number of modalities is < high > (>= 3 if I read correctly the paper, but may be another limit for such models and for cases of unkwnown, estimated variance?), beside convergence problems, and instead prefer fixed effects below even if philosophically a random effect would be needed (experiments on two patients only) --- and that there is a link between the efficiency of the shrinkage effect and the ability to estimate correctly the variance?

This would also explain how it is possible to associate a shrinkage to each random effect...

As far as I could see, however, the shrinkage estimator can also improve regression coefficients, when they are more than 3. Does it still holds when dealing with multidimensionnal vectors of which each composent represent very different things ? And for regression coefficients, if shrinked version gives better values, wouldn't it be logical to build tests on these coefficients on the shrinked values?
Is it possible? (but these questions are on the frontier to be off-topic I guess).

My other concern is about the usage of shrinkage as a diagnostic. If I understood correctly Douglas answer, size of shrinkage measures how informative is the data of a single patient to estimate its own value. Hence, if shrinkage is important, does it mean that the model is not suitable for looking into individual predictions, but only average ones --- hence, useless in PK-pop for adaptating doses for instance? Is there any guides to define what is an acceptable shrinkage? And does it have other values for model's diagnostic and interpretation?

Last point: I understand well in the paper how to calculate the shrinkage factor (there seems to be several different but close formulas according to the reference, but I guess these are only variants?), using obtained values for each individual. But for several linear models, as mentionned by Douglas, it is not possible to obtain individual parameters. In such case, how is shrinkage computed/estimated ?

Thanks again in advance for any answer,

-- 
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html