[Rd] validObject() -> slow down ?! [was "package:Matrix handling ..."]

Mon Jul 10 12:30:44 CEST 2006

[Diverted from R-help to R-devel]

>>>>> "roger" == roger koenker <roger at ysidro.econ.uiuc.edu>
>>>>>     on Sun, 9 Jul 2006 12:31:16 -0500 writes:

    >>
    roger> On 7/8/06, Thaden, John J <ThadenJohnJ at uams.edu>
    roger> wrote:

    >> As there is nothing inherent in either compressed,
    >> sparse, format that would prevent recognition and
    >> handling of duplicated index pairs, I'm curious why the
    >> dgCMatrix class doesn't also add x values in those
    >> instances?

    roger> why not multiply them?  or take the larger one, or
    roger> ...?  I would interpret this as a case of user
    roger> negligence -- there is no "natural" default behavior
    roger> for such cases.

    roger> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

    >> Your matrix Mc should be flagged as invalid.  Martin and
    >> I should discuss whether we want to add such a test to
    >> the validity method.  It is not difficult to add the test
    >> but there will be a penalty in that it will slow down all
    >> operations on such matrices 

hmm, maybe "all operations" is slightly pessimistic.
The issue seems to be *when* (under what exact circumstances)
the 'validity' method for a class will be called, i.e., when the
equivalent of  validObject(<obj>) should be called automatically.

We (those from R-core present) discussed this question a
bit last summer in Seattle, and we had a proposal by Robert Gentleman,
that this should both be better defined and documented and also
slightly changed -- such that validObject() is called less
frequently.

IIRC, one consequence of that is the 'complete = FALSE' default
that  validObject() has got in the mean time.  But I don't know
about the other issue, of ensuring (or not) that validObject()
is not called too often.

I wonder if we should consider a new optional argument to
new(..) [ well actuallly,  initialize() ] :

the default  new(.....,  .check.validity = TRUE)
would call {the equivalent of} validObject() after object
creation, but one could always explicitly use
	  new(.....,  .check.validity = FALSE)
for fast "but dangerous" objet creation.

    >> and I'm not sure if we want to pay that price to catch a
    >> rather infrequently occuring problem.

    roger> Elaborating the validity procedure to flag such
    roger> instances seems to be well worth the speed penalty in
    roger> my view.  Of course, anticipating every such misstep
    roger> imposes a heavy burden on developers and constitutes
    roger> the real "cost" of more elaborate validity checking.

At the moment I tend to agree with Roger that we (Matrix
authors) should try to add more stringent testing even at some
cost --- particularly if that penalty would only occur at object
creation time. One important "use case" of our sparse matrices
of course are lmer() calls. They shouldn't become slower noticably.

    roger> [My 2cents based on experience with SparseM.]

Martin