[RsR] covrob --- some OOP-comments

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Mar 24 18:44:25 CET 2006


>>>>> "PetRd" == Peter Ruckdeschel <Peter.Ruckdeschel using uni-bayreuth.de>
>>>>>     on Fri, 24 Mar 2006 14:51:37 +0100 writes:


    PetRd> On Thu, 23 Mar 2006, Valentin Todorov wrote:
    >> On Thu, 23 Mar 2006, Peter Ruckdeschel wrote:
    >>> On Thu, 23 Mar 2006, Valentin Todorov wrote:

    >>>> - using of accessor methods. ...

    PetRd> [snip]

    >>> *write an accessor function 'distance' (i.e. a method under a corresp.
    >>> S4-generic)
    >>> which in the first place checks if the 'distance' is NULL and if so
    >>> writes
    >>> the values into the parent frame--- something like
    >>> eval.parent(substitute( object using distance <- distanceValues ))
    >> 
    >> 
    >> Yes, everything is in the last line, but I still do not know which object
    >> will be modified.

    PetRd> sorry for having been too imprecise here: 'object' was meant to be the
    PetRd> argument of
    PetRd> class 'covstruct' of the function 'distance' (resp. getDistance) upon
    PetRd> which 'distance'
    PetRd> dispatched by the S4 mechanism.
    PetRd> The evaluation in the parent frame should imitate the
    PetRd> call-by-reference-passing
    PetRd> of arguments as in your sketch of C++  code below:

    >> In C++ and friends I would define
    >> dist=null;
    >> in the constructor and then write a get-accessor like this
    >> 
    >> vector getDistance(){
    >> if(dist==null)
    >> dist = computeDIstance();
    >> return dist;
    >> }

    PetRd> [snip]

    >>> Are the methods displayed in Valentin's diagram (like show(),...)
    >>> meant to follow the S4 paradigm, i.e. dispatched according to a
    >>> generic, or are they meant to be functional slots (like in classical
    >>> OOP, e.g. C++)?
    >> 
    >> I had S4 classes in mind, but it is not that easy to express this in UML:
    >> particularly, the classes Cov, Mcd, Ogk and Mest are S4 classes returned by
    >> the functions cov, covMcd, covOgk and covMest respectively. Most of their
    >> methods - show/plot/summary - are implemented for Cov or the abstract CovR
    >> and can be applied polymorphic on the subclasses.

    PetRd> Thank you for this clarification.

yes, thank you, Valentin; 
that also sounds very well designed to me!

I agree that a class treee i.e. inheritance is a very good thing
here!
But I still think we should  keep the  covrob(....,  method = "..")   
approach, additionally to your proposal;
if only mainly for didactical / documentational reasons.

    >>> [2] more general:
    >>> 
    >>> In the "pure" OOP setup as in the Valentin's display,
    >> 
    >> ...
    >> 
    >> this needs further thinking and discussing...

    PetRd> This was just a suggestion anyway!
    PetRd> --- I was only wondering  how to cope with control structures
    PetRd> which may not so easily be organized in a hereditary class-hierarchy,
    PetRd> but (see below)

    >>> The same of course could be done in case of other control structures
    >>> like for the psi functions.
    >> 
    >> 
    >> have a look at my implementation of the psi-functions used for the
    >> constrained M-estimates in covMest() in the new version of rrcov that
    >> I uploaded yesterday
    >> (still stays in Incoming, I'll send it you offline).


    PetRd> Now it's in the regular archive :-) --- and I have had a look into it:

    PetRd> In fact your code to covMest () shows how to cope with
    PetRd> varying control stuctures in a simpler and computational
    PetRd> efficient manor than with my proposed S4 classes!
    PetRd> (except for the passing of single parameters of control,
    PetRd> which then should be avoided)

    >>> On the other hand, we should keep in mind that all these dispatching
    >>> operations will cost some computation time --- so we should
    >>> thoroughly test whether this gain in generality is worth this extra
    >>> effort.
    >> 
    >> 
    >> very important. A straightforward, !OOP implementation can be more than
    >> twice faster than S4 implementation.

    PetRd> I had imagined something like that -- but of course without
    PetRd> computational evidence!

    >> I'll provide some more computational evidence.

    PetRd> Thank you -- touché;

But I (and most R-core members in general) think that this
should not be taken to make a decision against using S4.
Some parts of the *current* S4 implementation are still
relatively slow, since most things happen on an interpreted
level (i.e. most is written in R) instead of a compiled level
(i.e. written in C).  But that's just a property of the current
implementation and not an inherent property of the S4 OOP
paradigm (so called "Function based" OOP as opposed to the other
"Class based" OOP that C++ or Java make use of).

I'm strongly advocating that we stay with S4 and use proper S4
methods for the S4 classed objects {instead of adding function slots}.

--- --- 

Now a few further comments on the original Fritz+Filzmoser
proposal   http://www.statistik.tuwien.ac.at/rsr/groups/mva/Abstract.pdf
which I found very stimulating; thank you Peter (and
Heinrich)!  I hope I am not repeating things that Peter
Ruckdeschel or Valentin already stressed and which I agree.
Below, I will refer to page numbers sometimes (and I enumerate
my points just for easier reference): 

1. In general I think we should have a bit less "optional"
   parts; in particular, top of page 4, I think one could
   require 'method'.

2. Mahalanobis distances (=: MDs) [still in 2.2, p.4]: 
   I think there are several important covrob() methods which
   work with MDs internally anyway; *and* recomputing them by
   the caller ("wrapper") is not really cheap.
   In these cases, I think we should provide for a way that the
   method function can return the MDs and the "wrapping
   function", namely, covrob() can make use of these values
   directly instead of recomputing them eventually.

3. '3.1  arguments of covrob()' :
   x: should be a numeric matrix  *or a data frame*
      
   and that's trivial to implement:  
            if(!is.matrix(x))  x <- data.matrix(x)

4. [3.1] 'cor: logical indicating if correlation should also be returned'
   
   I don't think that's a good idea.  It complicates the whole
   design slightly but unnecessarily. 
   
   Rather, we should define a smart  cor()  (S4) method for our class
   which
    - uses an existing cor matrix ``if it's there''
    - calls cov2cor() otherwise  {and stores the corr.matrix with
       the object; with similar implementation as Peter R has
       explained for the 'md' / 'distance' slot and
       getDistance() accessor}.

5. [3.1] 'weight.quantile' (chisq quantile for computing 0/1
   weights based on the MDs):  Definitely not here, but rather
   provide a method again that computes 0/1 weights -- (*or other
   weights which I would prefer*) from a covR object.

6. '3.2 output of covrob' (page 5)
   (which really describes the class returned.)

   No 'md.wt' slot:  1) computation is trivial ; 
		     2) why these weights and not others
			
7. '4. Additional functions'  as was said before in this thread:

   - very nice graphics!

   - We should use proper S4 methods and not S3-like 
     <generic>.<class> functions

------------------

I assume that at least the base class(es) ('Cov', 'CovR' in
Valentin's naming scheme) should be put into 'robustbase'  ASAP,
so other packages can import them from the robustbase namespace
and extend (aka "inherit from") them.

May I remind everyone that there has been a new version, 0.1-4,
on CRAN for a while; unfortunately not available as binary
package for windows (because of some small C library
incompatibilities that have been resolved in the mean time).
Additionally to the source package on CRAN, you can also always
look at the current development version of "robustbase" at
https://svn.R-project.org/R-packages/trunk/robustbase/

Something that has not happened yet, is any integration of Matias
Salibian-Barrera's "roblm / lmrob()" functionality.

Martin Maechler, ETH Zurich




More information about the R-SIG-Robust mailing list