[RsR] Maximum number of variables with lmRob? Error: singular matrix encountered

Manuel Koller ko||er @end|ng |rom @t@t@m@th@ethz@ch
Thu Aug 30 08:26:14 CEST 2012


Dear Fabricio

Great that it works with lmrob :-). Are the differences really that
"substantial"? Even when taking the standard error into account?
Of course, you would expect "some" differences, because the two
methods do not use the exact same algorithms and also use different
default configuration parameters:

- For datasets with categorical and continuous variables, lmRob uses
an M-S-estimate as initial estimate (replacing the S-estimate). It is
basically an L1-estimate for the categorical part and an S-estimate
for the continuous part. lmrob does not need to split the estimation
into categorical and continuous parts, it just computes an S-estimate.
That's one source of differences.

- The defaults for the psi- or rho-functions used are different. If I
remember correctly, lmRob uses the so-called "optimal" while lmrob
uses the bisquare psi-function. While "optimal" is clearly worse, both
are prone to produce somewhat unstable estimates for small datasets
(if you draw sensitivity curves of the initial estimates you will get
non-smooth curves with many jumps, etc). That's why we advocate to use
psi-functions that redescend more slowly, like "lqq" (only available
for lmrob). That's another source of differences.

- lmrob uses randomized algorithms and thus produces slightly
different results for different seeds.

BTW: Have you considered using the config option setting="KS2011" in
lmrob? This is an alternative set of default options using the "lqq"
psi-function and SMDM-estimates instead of MM-estimates. This will
give you better tests, especially if you have many predictors and a
not so large dataset.

Best regards,

Manuel

On Wed, Aug 29, 2012 at 9:37 PM, Fabricio Vasselai
<fabriciovasselai using gmail.com> wrote:
> Dears, Thank you very much for your replies so far.
>
> First, about my dataset, I am not allowed to publish it here yet. But I am
> arranging to simulate something similar to show you a data-baase example of
> the problem.
>
> Manuel: you just gave me a good idea. And indeed, the problem does not
> happen when I use lmrob! Awesome. The only problem is: when I run smaller
> models, with 10 variables for instance, so lmRob does work for me, then the
> results from lmRob and from lmrob are substantialy different. Sorry for the
> silly question, but I would be very interesting in understanding why those
> two versions of the package could give such a difference. It can be very
> interesting to asnwer many theoretical qusetions.
>
> S.Ellison: very right, it would make sense. But the problem still happens
> with only one interaction, unfortunattely.
>
> Thanks a lot right now for the insights.
>
> FABRICIO
>
>

-- 
Manuel Koller <koller using stat.math.ethz.ch>
Seminar für Statistik, HG G 18, Rämistrasse 101
ETH Zürich  8092 Zürich  SWITZERLAND
phone: +41 44 632-4673 fax: ...-1228
http://stat.ethz.ch/people/kollerma/




More information about the R-SIG-Robust mailing list