[RsR] Maximum number of variables with lmRob? Error: singular matrix encountered

Mon Sep 3 20:14:25 CEST 2012

Dears,

Sorry for taking so long to get in touch again. But I really thank the
insights so far. Mainly, I really apreciate the detailed answer given by
Manuel.

> Great that it works with lmrob :-). Are the differences really that
> "substantial"? Even when taking the standard error into account?
> Of course, you would expect "some" differences, because the two
> methods do not use the exact same algorithms and also use different
> default configuration parameters:

Indeed, depending on the models I run, one or other results are quite
different when I use lmrob and lmRob. More clearly in the p-values. but you
are right that (a) it is not the usual pattern, as generally results are
similar but lmrob seems to be more efficient; (b) if considering the std
errors, most differences seems to be not that different anymor;

> - For datasets with categorical and continuous variables, lmRob uses
> an M-S-estimate as initial estimate (replacing the S-estimate). It is
> basically an L1-estimate for the categorical part and an S-estimate
> for the continuous part. lmrob does not need to split the estimation
> into categorical and continuous parts, it just computes an S-estimate.
> That's one source of differences.

Yep, this is a perfect point to be made. The models I've been trying they
always mix continuous and binary variables. LmRob goes directly for a M-S
estimate. After I read your email, I tried separate models for continuous
and for binary variables both with lmrob and lmRob. Differences in results
from the models with only continuous variables were always tiny. Even
simulating data and models, I got the same results, i.e. I think most
differences I've been experiencing between lmrob and lmRob were due to the
different approach to the mix of continuous and categorical variables.

> - The defaults for the psi- or rho-functions used are different. If I
> remember correctly, lmRob uses the so-called "optimal" while lmrob
> uses the bisquare psi-function. While "optimal" is clearly worse, both
> are prone to produce somewhat unstable estimates for small datasets
> (if you draw sensitivity curves of the initial estimates you will get
> non-smooth curves with many jumps, etc). That's why we advocate to use
> psi-functions that redescend more slowly, like "lqq" (only available
> for lmrob). That's another source of differences.

Ok. Acknowledged, it makes sense. I have a dataset of 600 cases, running
models with 16 varaibles, 3 interactions. Even though, as I said I think
the main differences I found so far were due to how each function handles
the mix of different types of variables.

> BTW: Have you considered using the config option setting="KS2011" in
> lmrob? This is an alternative set of default options using the "lqq"
> psi-function and SMDM-estimates instead of MM-estimates. This will
> give you better tests, especially if you have many predictors and a
> not so large dataset.

Cool, I will try this, as it speaks to me: not so large dataset, not so
small models.

Thanks a ton for the info, Manuel. And thanks everybody for the references.

FABRICIO

2012/8/30 Manuel Koller <koller using stat.math.ethz.ch>

> Hi Kaveh
>
> Martin already gave the reference to paper where we develop the SMDM
> estimators. The second paper he mentions can be found on arxiv for now
> (arXiv:1208.5595v1, http://arxiv.org/abs/1208.5595). There you can
> find the algorithm we implemented in lmrob that can also compute
> S-estimates when there are categorical variables present.
>
> Best regards,
>
> Manuel
>
> On Thu, Aug 30, 2012 at 12:53 PM, Martin Maechler
> <maechler using stat.math.ethz.ch> wrote:
> >>>>>> "KV" == Kaveh Vakili <kaveh.vakili using wis.kuleuven.be>
> >>>>>>     on Thu, 30 Aug 2012 12:20:45 +0200 writes:
> >
> >     KV> Sorry I accidentally send a draft message to the list
> >     KV> (maybe the heatwave).
> >
> >     KV> Below the correct version
> >
> >
> >
> >     KV> Dear Manuel,
> >
> >     KV> This all sounds very exiting, would you by any chance
> >     KV> have a link to a white paper/technical report where
> >     KV> these innovations are documented in greater details
> >     KV> than in the help files associated with lmRob?
> >
> > I think you have understood that Manuel advocates using lmrob(),
> > [not lmRob() !!]
> >
> > So please do look at  ?lmrob
> > It has several references, most notably to
> >
> >   Koller, M. and Stahel, W.A. (2011), Sharpening Wald-type inference
> >   in robust regression for small samples, _Computational Statistics
> >   & Data Analysis_ *55*(8), 2504-2515.
> >
> > For newer results, manual will provide newer links.
> >
> > Unfortunately, stupid editorial processes lead to papers not
> > being published just because they want to publicize a simple
> > good idea for a new algorithm that has not been found by anyone
> > in 20-30 years history, but the editor/referees still claim the
> > idea to be too simple to be worth publishing .....
> >
> > Martin
> >
> >     KV> Best regards,
> >
> >
> >     KV> On 08/30/2012 12:04 PM, Kaveh Vakili wrote:
> >     >>
> >     >> Dear Manual,
> >     >>
> >     >> This all sounds very existing, would you do by any chance
> >     >> have a link to a white paper where these numerous
> >     >> innovations are explained more in detail than in the
> >     >> lmRob help files?
> >     >>
> >     >> Best regards,
> >     >>
> >     >>
> >     >>
> >     >>
> >     >> On 08/30/2012 08:26 AM, Manuel Koller wrote:
> >     >>> - lmrob uses randomized algorithms and thus produces slightly
> >     >>> different results for different seeds.
> >     >>
> >     >> _______________________________________________
> >     >> R-SIG-Robust using r-project.org mailing list
> >     >> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
> >
> >     KV> _______________________________________________
> >     KV> R-SIG-Robust using r-project.org mailing list
> >     KV> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
> >
> > _______________________________________________
> > R-SIG-Robust using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>
>
>
> --
> Manuel Koller <koller using stat.math.ethz.ch>
> Seminar f�r Statistik, HG G 18, R�mistrasse 101
> ETH Z�rich  8092 Z�rich  SWITZERLAND
> phone: +41 44 632-4673 fax: ...-1228
> http://stat.ethz.ch/people/kollerma/
>
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>

	[[alternative HTML version deleted]]