[RsR] covrob --- some OOP-comments
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Mar 24 18:44:25 CET 2006
>>>>> "PetRd" == Peter Ruckdeschel <Peter.Ruckdeschel using uni-bayreuth.de>
>>>>> on Fri, 24 Mar 2006 14:51:37 +0100 writes:
PetRd> On Thu, 23 Mar 2006, Valentin Todorov wrote:
>> On Thu, 23 Mar 2006, Peter Ruckdeschel wrote:
>>> On Thu, 23 Mar 2006, Valentin Todorov wrote:
>>>> - using of accessor methods. ...
PetRd> [snip]
>>> *write an accessor function 'distance' (i.e. a method under a corresp.
>>> S4-generic)
>>> which in the first place checks if the 'distance' is NULL and if so
>>> writes
>>> the values into the parent frame--- something like
>>> eval.parent(substitute( object using distance <- distanceValues ))
>>
>>
>> Yes, everything is in the last line, but I still do not know which object
>> will be modified.
PetRd> sorry for having been too imprecise here: 'object' was meant to be the
PetRd> argument of
PetRd> class 'covstruct' of the function 'distance' (resp. getDistance) upon
PetRd> which 'distance'
PetRd> dispatched by the S4 mechanism.
PetRd> The evaluation in the parent frame should imitate the
PetRd> call-by-reference-passing
PetRd> of arguments as in your sketch of C++ code below:
>> In C++ and friends I would define
>> dist=null;
>> in the constructor and then write a get-accessor like this
>>
>> vector getDistance(){
>> if(dist==null)
>> dist = computeDIstance();
>> return dist;
>> }
PetRd> [snip]
>>> Are the methods displayed in Valentin's diagram (like show(),...)
>>> meant to follow the S4 paradigm, i.e. dispatched according to a
>>> generic, or are they meant to be functional slots (like in classical
>>> OOP, e.g. C++)?
>>
>> I had S4 classes in mind, but it is not that easy to express this in UML:
>> particularly, the classes Cov, Mcd, Ogk and Mest are S4 classes returned by
>> the functions cov, covMcd, covOgk and covMest respectively. Most of their
>> methods - show/plot/summary - are implemented for Cov or the abstract CovR
>> and can be applied polymorphic on the subclasses.
PetRd> Thank you for this clarification.
yes, thank you, Valentin;
that also sounds very well designed to me!
I agree that a class treee i.e. inheritance is a very good thing
here!
But I still think we should keep the covrob(...., method = "..")
approach, additionally to your proposal;
if only mainly for didactical / documentational reasons.
>>> [2] more general:
>>>
>>> In the "pure" OOP setup as in the Valentin's display,
>>
>> ...
>>
>> this needs further thinking and discussing...
PetRd> This was just a suggestion anyway!
PetRd> --- I was only wondering how to cope with control structures
PetRd> which may not so easily be organized in a hereditary class-hierarchy,
PetRd> but (see below)
>>> The same of course could be done in case of other control structures
>>> like for the psi functions.
>>
>>
>> have a look at my implementation of the psi-functions used for the
>> constrained M-estimates in covMest() in the new version of rrcov that
>> I uploaded yesterday
>> (still stays in Incoming, I'll send it you offline).
PetRd> Now it's in the regular archive :-) --- and I have had a look into it:
PetRd> In fact your code to covMest () shows how to cope with
PetRd> varying control stuctures in a simpler and computational
PetRd> efficient manor than with my proposed S4 classes!
PetRd> (except for the passing of single parameters of control,
PetRd> which then should be avoided)
>>> On the other hand, we should keep in mind that all these dispatching
>>> operations will cost some computation time --- so we should
>>> thoroughly test whether this gain in generality is worth this extra
>>> effort.
>>
>>
>> very important. A straightforward, !OOP implementation can be more than
>> twice faster than S4 implementation.
PetRd> I had imagined something like that -- but of course without
PetRd> computational evidence!
>> I'll provide some more computational evidence.
PetRd> Thank you -- touché;
But I (and most R-core members in general) think that this
should not be taken to make a decision against using S4.
Some parts of the *current* S4 implementation are still
relatively slow, since most things happen on an interpreted
level (i.e. most is written in R) instead of a compiled level
(i.e. written in C). But that's just a property of the current
implementation and not an inherent property of the S4 OOP
paradigm (so called "Function based" OOP as opposed to the other
"Class based" OOP that C++ or Java make use of).
I'm strongly advocating that we stay with S4 and use proper S4
methods for the S4 classed objects {instead of adding function slots}.
--- ---
Now a few further comments on the original Fritz+Filzmoser
proposal http://www.statistik.tuwien.ac.at/rsr/groups/mva/Abstract.pdf
which I found very stimulating; thank you Peter (and
Heinrich)! I hope I am not repeating things that Peter
Ruckdeschel or Valentin already stressed and which I agree.
Below, I will refer to page numbers sometimes (and I enumerate
my points just for easier reference):
1. In general I think we should have a bit less "optional"
parts; in particular, top of page 4, I think one could
require 'method'.
2. Mahalanobis distances (=: MDs) [still in 2.2, p.4]:
I think there are several important covrob() methods which
work with MDs internally anyway; *and* recomputing them by
the caller ("wrapper") is not really cheap.
In these cases, I think we should provide for a way that the
method function can return the MDs and the "wrapping
function", namely, covrob() can make use of these values
directly instead of recomputing them eventually.
3. '3.1 arguments of covrob()' :
x: should be a numeric matrix *or a data frame*
and that's trivial to implement:
if(!is.matrix(x)) x <- data.matrix(x)
4. [3.1] 'cor: logical indicating if correlation should also be returned'
I don't think that's a good idea. It complicates the whole
design slightly but unnecessarily.
Rather, we should define a smart cor() (S4) method for our class
which
- uses an existing cor matrix ``if it's there''
- calls cov2cor() otherwise {and stores the corr.matrix with
the object; with similar implementation as Peter R has
explained for the 'md' / 'distance' slot and
getDistance() accessor}.
5. [3.1] 'weight.quantile' (chisq quantile for computing 0/1
weights based on the MDs): Definitely not here, but rather
provide a method again that computes 0/1 weights -- (*or other
weights which I would prefer*) from a covR object.
6. '3.2 output of covrob' (page 5)
(which really describes the class returned.)
No 'md.wt' slot: 1) computation is trivial ;
2) why these weights and not others
7. '4. Additional functions' as was said before in this thread:
- very nice graphics!
- We should use proper S4 methods and not S3-like
<generic>.<class> functions
------------------
I assume that at least the base class(es) ('Cov', 'CovR' in
Valentin's naming scheme) should be put into 'robustbase' ASAP,
so other packages can import them from the robustbase namespace
and extend (aka "inherit from") them.
May I remind everyone that there has been a new version, 0.1-4,
on CRAN for a while; unfortunately not available as binary
package for windows (because of some small C library
incompatibilities that have been resolved in the mean time).
Additionally to the source package on CRAN, you can also always
look at the current development version of "robustbase" at
https://svn.R-project.org/R-packages/trunk/robustbase/
Something that has not happened yet, is any integration of Matias
Salibian-Barrera's "roblm / lmrob()" functionality.
Martin Maechler, ETH Zurich
More information about the R-SIG-Robust
mailing list