[Rd] validObject() -> slow down ?! [was "package:Matrix handling ..."]
Martin Maechler
maechler at stat.math.ethz.ch
Mon Jul 10 12:30:44 CEST 2006
[Diverted from R-help to R-devel]
>>>>> "roger" == roger koenker <roger at ysidro.econ.uiuc.edu>
>>>>> on Sun, 9 Jul 2006 12:31:16 -0500 writes:
>>
roger> On 7/8/06, Thaden, John J <ThadenJohnJ at uams.edu>
roger> wrote:
>> As there is nothing inherent in either compressed,
>> sparse, format that would prevent recognition and
>> handling of duplicated index pairs, I'm curious why the
>> dgCMatrix class doesn't also add x values in those
>> instances?
roger> why not multiply them? or take the larger one, or
roger> ...? I would interpret this as a case of user
roger> negligence -- there is no "natural" default behavior
roger> for such cases.
roger> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:
>> Your matrix Mc should be flagged as invalid. Martin and
>> I should discuss whether we want to add such a test to
>> the validity method. It is not difficult to add the test
>> but there will be a penalty in that it will slow down all
>> operations on such matrices
hmm, maybe "all operations" is slightly pessimistic.
The issue seems to be *when* (under what exact circumstances)
the 'validity' method for a class will be called, i.e., when the
equivalent of validObject(<obj>) should be called automatically.
We (those from R-core present) discussed this question a
bit last summer in Seattle, and we had a proposal by Robert Gentleman,
that this should both be better defined and documented and also
slightly changed -- such that validObject() is called less
frequently.
IIRC, one consequence of that is the 'complete = FALSE' default
that validObject() has got in the mean time. But I don't know
about the other issue, of ensuring (or not) that validObject()
is not called too often.
I wonder if we should consider a new optional argument to
new(..) [ well actuallly, initialize() ] :
the default new(....., .check.validity = TRUE)
would call {the equivalent of} validObject() after object
creation, but one could always explicitly use
new(....., .check.validity = FALSE)
for fast "but dangerous" objet creation.
>> and I'm not sure if we want to pay that price to catch a
>> rather infrequently occuring problem.
roger> Elaborating the validity procedure to flag such
roger> instances seems to be well worth the speed penalty in
roger> my view. Of course, anticipating every such misstep
roger> imposes a heavy burden on developers and constitutes
roger> the real "cost" of more elaborate validity checking.
At the moment I tend to agree with Roger that we (Matrix
authors) should try to add more stringent testing even at some
cost --- particularly if that penalty would only occur at object
creation time. One important "use case" of our sparse matrices
of course are lmer() calls. They shouldn't become slower noticably.
roger> [My 2cents based on experience with SparseM.]
Martin
More information about the R-devel
mailing list