[R] all subsets for glm

Tue Apr 7 11:01:54 CEST 2009

> If you actually want to find the best subsets, you can get a good 
> approximation by using leaps on the weighted least squares fit that
> is the last iteration of the IWLS algorithm for fitting the glm.
> 
> Running regsubsets witha reasonably large value of nbest and then 
> refitting the top models as glms afterwards will fairly realiably
> give the best glms.

Thanks, that sounds interesting. I am as yet clueless to the workings
of IWLS, so maybe this is nonsense: The result of running glm on the
full model (all variables) is a crass example for overfitting, i.e.
zero residuals, all R_i^2 close to 1, large coefficients. Would then
the "weighed last squares fit of the last iteration of IWLS" not be
pretty meaningless ?

> Whether this is better than lasso depends on what you are trying to
> do - IMO the only point of all-subsets regression is to get many best
> models rather than a single one, and lasso doesn't do at all well at
> that.

Yes, I am trying to get a number of best models, since the final model
selection shall be based on interpretability and expert knowledge. By
now I have bootstrapped the lasso (using glmpath) to generate such a
set, but the resulting models are very similar and I suspect there are
is a larger variety of "best models".

Harald