[RsR] covrob --- some OOP-comments

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Mar 27 14:41:20 CEST 2006


>>>>> "ValenT" == Valentin Todorov <valentin.todorov using chello.at>
>>>>>     on Sat, 25 Mar 2006 11:18:49 +0100 writes:

    ValenT> ----- Original Message ----- 
    ValenT> From: "Martin Maechler" <maechler using stat.math.ethz.ch>
    ValenT> ...

    >> I assume that at least the base class(es) ('Cov', 'CovR' in
    >> Valentin's naming scheme) should be put into 'robustbase'  ASAP,
    >> so other packages can import them from the robustbase namespace
    >> and extend (aka "inherit from") them.

    ValenT> I have uploaded a new version of rrcov with
    ValenT> constrained M-estimates of location and scatter
    ValenT> covMest(), for test purposes, which still returns an
    ValenT> S3 class. Now I am implementing Cov, CovR and the
    ValenT> derived from them Mest which I want to upload in the
    ValenT> next days. As soon as this construction is stable,
    ValenT> I'll move all three of them to robustbase (Martin, I
    ValenT> rely on your support). 

sure.
As I alluded earlier, I'd also be happy for you to get direct
write access to the subversion repository, i.e. to the database engine
which is behind https://svn.R-project.org/R-packages/trunk/robustbase

    ValenT> After that I'll port covMcd to return an S4 class derived from CovR.

    ValenT> Two questions arise:

    ValenT> - is there some "standard" for naming classes. I
    ValenT> assume the usual starting with a capital letter? In
    ValenT> some cases one can go further and select a
    ValenT> particular capital letter. For example in Visual
    ValenT> C++/MFC every class started with a capital C (in our
    ValenT> case we would have Ccov, CcovR, Cmcd, Cogk, Cmest).

There are "some" standards, but not endorsed officially; 
particularly there is no capitalization or "prefix"
standard.  In a 'function based' OO system like S4, the classes
are a bit less visible than in a 'class based' system like C++/Java.

Several of the more classical R packages that have been using S4 
use simple all-lowercase-alphabet class names such as 'mle' or
'pixmap' but also 'sparseMatrix'.  The one ``rule'' that I think
"everyone" agrees on is  that the creator function, particularly of
a ``principal'' class, should have the identical
name as the class it creates. E.g. mle() returns S4 objects of
class 'mle', Matrix() returns objects inheriting from class
"Matrix", etc.  However even this rule is sometimes not
practical for diverse reasons, typically name clashes with
already existing functionality in R (in possibly other packages, etc).
One of the main reasons that IMO it doesn't make sense trying to
impose such standards is the fact that S (and hence R) has a
history of more than 20 years, and one has wanted to stay back
compatible as much as possible when providing new facilities.

If we try to adhere to the only "agreed upon" standard above, our
class would need to be called  "covrob";  its super class (which
conceptually also contains the classical non-robustly estimated
covariance structures) could well be called "cov", even though
the standard cov() function does not return classed objects.
Further thinking about this directly leads to Valentin's 2nd
question: 

    ValenT> - What happens with the user of, for example
    ValenT> covMcd() when it begins to return an S4 class "Mcd"
    ValenT> instead of the current S3 "mcd". 

    ValenT> Of course these that just use print/plot/summary
    ValenT> will not notice the change, but what about these
    ValenT> that use the returned object within their programs?

Very good point that has also come to my mind when contemplating
your proposed inheritance / class hierarchy:

All the user's code / scripts / functions that rely on the
current structue of, say,  covMcd(),  will stop working
correctly.

    ValenT> This is actually a general question on compatibility.

Indeed!

One approach that I usually favor is to require new function
names for getting the new-class results *and*
keep the old functions returning a back-compatible result; in
the present case keep covMcd() or covOGK() returning the lists
(and maybe S3 class) they currently return.

That would be one argument pro only having covrob() return an S4
class and all the underlying "method functions" return lists
(possibly with an S3 class) -- just about what Peter and Heinz
have been proposing.  Additionally one could have newly named
functions {say 'covMCD'} that call the same underlying "method
functions" as covrob, e.g. call 'covMcd(..)', and only covMCD
would return an S4 class "covMCD" which extends (or "inherits
from") "covrob".

The completely alternative approach would be to declare that all
current users of 'rrcov' (or 'robustbase') should change their
scripts and functions whenever they start using the
new_generation-version of "robustbase" and start using the slots
(or accessor functions were we provide them) of the new
S4-classed return values.
This 2nd ``brutal'' approach (of non-compatible upgrade) is
possible in situations where not too many users of the current
package exist, and they basically agree to do the extra work of
upgrading their R scripts.
Personally, I'm  *very*  reluctant against compatibility
breaking -- though I agree it has to happen sometimes in order
to not hinder progress.

Martin




More information about the R-SIG-Robust mailing list