[RsR] function naming for "robust ..."

Thu Jan 19 18:30:34 CET 2006

I'm finally taking up this thread from before Christmas:

>>>>> "MMa" == Martin Maechler <maechler using stat.math.ethz.ch>
>>>>>     on Thu, 22 Dec 2005 12:12:50 +0100 writes:

>>>>> "Matias" == Matias Salibian-Barrera <matias using stat.ubc.ca>
>>>>>     on Wed, 21 Dec 2005 14:07:55 -0800 writes:

    MMa> .........

    Matias> A few days ago I uploaded the package "roblm" to CRAN.
    Matias> ........

    Matias> Note that the name of the package (and the
    Matias> corresponding class) is "roblm", but this does not
    Matias> mean that I necessarily prefer this name over
    Matias> others. I've been working on this for some time now,
    Matias> and for the reasons that I mentioned in Treviso
    Matias> ("rlm" and "lmRob" are already taken) I settled for
    Matias> roblm.

    MMa> yes. In Treviso, we had talked about trying to go for "rlm" and
    MMa> I had offered to try to resolve the name clash with MASS::rlm().
    MMa> I now think that this may not be a good idea:  rlm()
    MMa> in MASS is really relatively prominent in the MASS book with
    MMa> years of tradition, and usage in many places.  Also its default
    MMa> method is "M", not "MM" { for a good reason: back compatibility
    MMa> with older versions of rlm()}.
    MMa> Hence our new  (robust|rob|rf|r)lm(Rob) function would never be
    MMa> 'call-compatible' to MASS::rlm() and for that reason I think we
    MMa> should strive for a different naming scheme.

    MMa> Andreas Ruckstuhl had raised the point already at Treviso, and
    MMa> on this list (Dec 7, "Re: [RsR] OGK covariance estimator") where
    MMa> he'd proposed
    MMa> r*   [as 'rlm' in MASS]
    MMa> rob* [as 'roblm' above]
    MMa> rf*  ["[R]obust [F]itting of ..", used in Andreas' package]

    MMa> The last one may be a bit more logical than all the others,
    MMa> since in one sense there's just one linear model with different
    MMa> fitting methodologies, since in fact,
    MMa> the error distribution hasn't been part of the
    MMa> model specification __for most statistical software__

    MMa> OTOH, "rob" is easy to pronounce/spell
    MMa> [OTOH, there's all the people called 'Rob' ...]

    MMa> As you can guess, from Andreas' list, I'd either take
    MMa> 'rob*' or 'rf*' and don't have strong preference.

In the mean time, I have talked with Andreas, and he has agreed
to go for 'rob*' rather than 'rf*' which he preferred for good
reasons.

Since there were no further comments or suggestions,
I think we've settled to adopt the 
   prefix  'rob'

for 'robustifying' functions like  'lm', 'glm'  etc, hence will
have roblm(), robglm() ..

==> We (well, Andreas, Matías and I, to a large extent) can now
collaborate on moving parts of Matias' "roblm" and Andreas'
unpublished "RobFit" package to  robustbase,
using function names roblm() and robglm() returning objects of
class 'roblm' or 'robglm'. 

Ideas and suggestions on this topic are very welcome!

The question how to adopt good function names for a
robustified cov() is a bit less trivial, 
since

  1) there are even more methods around

  2) in S and R, there have been diverging traditions to call these:
    MASS has  cov.rob() but also cov.trob() cov.mcd(), cov.mve()
    where only the latter two are called from cov.rob()

  3) Of course, the "." convention is somewhat deprecated currently
     {since it leads to confusion with S3 method definitions}.

I think our aim should be to go for
robcov() eventually with many possible 'method = "...."'
-------- arguments 
and also   probably have it return an S4 class object, along the
line the working group "Multivariate" in Treviso has proposed
it.
OTOH, I'd also like to add Valentin's good  covMcd() {and also ltsReg()}
function to the  "robustbase" package as soon as possible,
and currently I'd propose to just keep covMcd() "as is"
and later have	  
    robcov(*, method = "MCD")		or rather
    robcov(*, method = "fastMCD1999") 
call Valentin's  covMcd()  
Apropos the method name: Now that I've seen the algorithm again,
I'm pretty much convinced that we'd see new (at least slightly
new)  fast MCDs in the future...

BTW, there is draft version 0.0-1 of "robustbase"
around (and inspectable from https://svn.r-project.org/R-packages/)
which in particular has many data sets to be used as examples;
most of them from Valentin Todorov, but also Rousseeuw/Croux'
Sn() and Qn() and the OGK cov.estimator, basically the code from
Kjell Konis. 

I plan to make the package also available 
>>> as source package only <<<  (i.e. *.tar.gz) 
for those who know how to install packages from the source
in order to get comment;
Since there are too many ``TO DO'' in there, I think it should
initially not go to CRAN, because once it is widely used, we
cannot easily modify the API (function arguments and result values/classes)
any more. OTOH, there are good reasons to aim for a CRAN version of the
package for spring, where even then it will probably still be 
version 0.x-y.

Martin Maechler,
ETH Zurich