[R] package:Matrix handling of data with identical indices

Thaden, John J ThadenJohnJ at uams.edu
Sun Jul 9 20:53:48 CEST 2006


On Sunday, July 09, 2006 12:31 PM, Roger Koenker = RK
<roger at ysidro.econ.uiuc.edu> wrote 

RK> On 7/8/06, Thaden, John J <ThadenJohnJ at uams.edu> wrote:

    JT> As there is nothing inherent in either compressed, sparse,
    JT> format that would prevent recognition and handling of
    JT> duplicated index pairs, I'm curious why the dgCMatrix
    JT> class doesn't also add x values in those instances?

RK> why not multiply them?  or take the larger one, 
RK> or ...?  I would interpret this as a case of user
RK> negligence -- there is no "natural" default behavior
RK> for such cases.

This user created example data to illustrate his question, but
of course he faces real data, analytical chemical in this case,
data that happen to come with an 8.4% occurrence of non-unique
index pairs, and also, quite literally, with a "natural" way 
to treat cases (the ~nature~ of the assay makes it correct to
sum them).  I can think of other natural data sets where 
averaging would be the "natural" behavior. So you are right 
that there is no "default" natural behavior, thus, my 
suggestion to leave that to user choice via function argument
or class slot, defaulted to summing.

Actually in this case there ~is~ one behavior superior to 
summing -- abstracting one of the data pair (that share indices)
into a second (very sparse) "overlay" matrix.  Perhaps it is
my negligence not to have done this instead querying the list :-)
I am doing it now.

Regards,
-John Thaden 

RK> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

  DB> Your matrix Mc should be flagged as invalid.  Martin and I should
  DB> discuss whether we want to add such a test to the validity method.
It
  DB> is not difficult to add the test but there will be a penalty in
that
  DB> it will slow down all operations on such matrices and I'm not sure
if
  DB> we want to pay that price to catch a rather infrequently occuring
  DB> problem.

RK> Elaborating the validity procedure to flag such instances seems
RK> to be well worth the  speed penalty in my view.  Of course,
RK> anticipating every such misstep imposes a heavy burden
RK> on developers and constitutes the real "cost" of more elaborate
RK> validity checking.
RK>
RK> [My 2cents based on experience with SparseM.]

Confidentiality Notice: This e-mail message, including any a...{{dropped}}



More information about the R-help mailing list