[R] lasso based selection for mixed model
dave fournier
otter at otter-rsch.com
Fri Nov 6 19:16:34 CET 2009
I have been working on lasso type model selection
using AD Model Builders random effects module.
I came across a few questions about this topic on the list
so I thought this would be of interest.
The advantage of AD Model Builder is that it lets you formulate
general nonlinear random effects models which is I believe
necessary for the lasso type approach. I think the approach would work
for any kind of RE model such as say a negative binomial mixed model,
but lets keep it simple for now.
Let the fixed effect parameters be a_i
The usual lasso penalty then is
p* sum_i abs(a_i)
with tunable penalty weight
Since AD Model Builders random effects module can deal with arbitrary
nonlinear random effects models. I tried simply adding this to the -LL.
However what happens is that in order to get the penalty large enough to
get the lasso effect the estimation procedure simply makes all the a_i
too small and uses the RE's instead to fit the data. I figured what was
needed was a penalty term which is more encouraging for setting most of
the terms to 0 and making a few of them large. Using a square root
would have this property so I tried
p* sum_i sqrt*(abs(a_i))
Of course the thing is not differentiable at 0 so I smoothed it off a
bit there, but that is the general idea.
To test it I generated data of the form
Y_ij = a(1) *X_1i + u_i +e_ij
for 1<=i<=500 1<=j<=5
and a(1)=2.0 and std_dev u_i =0.6 std dev e_ij = 1.5
and the IV X_1i is std normal.
For the other IV's I generated 9 more IV's X_ki for 2<=k<=10
with X_ki = X_1i + .1*V_ki
where the V_li's are std normal
The correlation matrix for them is
1 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995
0.995 1 0.99 0.99 0.989 0.988 0.99 0.989 0.99 0.99
0.995 0.99 1 0.99 0.989 0.99 0.99 0.99 0.989 0.99
0.995 0.99 0.99 1 0.99 0.989 0.99 0.99 0.99 0.99
0.995 0.989 0.989 0.99 1 0.989 0.99 0.989 0.99 0.99
0.995 0.988 0.99 0.989 0.989 1 0.99 0.99 0.989 0.99
0.995 0.99 0.99 0.99 0.99 0.99 1 0.991 0.99 0.991
0.995 0.989 0.99 0.99 0.989 0.99 0.991 1 0.988 0.99
0.995 0.99 0.989 0.99 0.99 0.989 0.99 0.988 1 0.99
0.995 0.99 0.99 0.99 0.99 0.99 0.991 0.99 0.99 1
so these are highly correlated explanatory variables.
Then I fit the model with no lasso penalty.
index name value std dev
1 a 3.1565e+00 1.2302e+00
2 a 6.7390e-02 3.8737e-01
3 a -1.9154e-01 4.0204e-01
4 a -2.7920e-01 3.9847e-01
5 a 3.0924e-02 3.9842e-01
6 a 7.6508e-02 3.8698e-01
7 a -5.3902e-01 4.1590e-01
8 a -2.4941e-01 4.0141e-01
9 a 2.5324e-01 3.8875e-01
10 a -2.9435e-01 4.1345e-01
11 sig_u 8.2845e-01 3.0015e-02
so a(1) is rather badly estimated and its estimated std dev is large.
One way to try and deal with this is ridge regression. This is like
adding the following quadratic penalty.
p* sum_i square(a_i)
Of course we know that this will have exactly the opposite effect to
what we want that is it tends to make all the estimated parameters non zero.
With a value of p=1.0 I got the following estimates.
index name value std dev
1 a 8.7729e-01 5.9032e-01
2 a 2.9461e-01 3.2352e-01
3 a 8.3218e-02 3.3546e-01
4 a 8.3569e-03 3.3438e-01
5 a 2.7345e-01 3.2984e-01
6 a 2.5172e-01 3.2822e-01
7 a -1.9834e-01 3.4721e-01
8 a 2.7677e-02 3.3404e-01
9 a 4.0135e-01 3.2855e-01
10 a 5.7042e-03 3.4325e-01
11 sig_u 8.3188e-01 3.0163e-02
So the std devs are smaller but the results are really bad.
Now the fun part. With a penalty weight p=10 and the
sqrt(abs()) penalty I got
index name value std dev
1 a 2.0191e+00 4.0619e-02
2 a 6.7300e-06 1.2917e-03
3 a 3.7718e-06 1.2915e-03
4 a 3.0421e-06 1.2914e-03
5 a 6.0954e-06 1.2917e-03
6 a 5.9105e-06 1.2916e-03
7 a 4.2511e-07 1.2914e-03
8 a 2.4084e-06 1.2914e-03
9 a 8.3686e-06 1.2919e-03
10 a 2.7657e-06 1.2914e-03
11 sig_u 8.3226e-01 3.0119e-02
So this looks like a promising approach.
Of course the real test for this is does it
work for real data? If anyone wants to cooperate
on this it could be fun.
Dave
--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com
More information about the R-help
mailing list