[R] package:Matrix handling of data with identical indices
Thaden, John J
ThadenJohnJ at uams.edu
Sun Jul 9 20:53:48 CEST 2006
On Sunday, July 09, 2006 12:31 PM, Roger Koenker = RK
<roger at ysidro.econ.uiuc.edu> wrote
RK> On 7/8/06, Thaden, John J <ThadenJohnJ at uams.edu> wrote:
JT> As there is nothing inherent in either compressed, sparse,
JT> format that would prevent recognition and handling of
JT> duplicated index pairs, I'm curious why the dgCMatrix
JT> class doesn't also add x values in those instances?
RK> why not multiply them? or take the larger one,
RK> or ...? I would interpret this as a case of user
RK> negligence -- there is no "natural" default behavior
RK> for such cases.
This user created example data to illustrate his question, but
of course he faces real data, analytical chemical in this case,
data that happen to come with an 8.4% occurrence of non-unique
index pairs, and also, quite literally, with a "natural" way
to treat cases (the ~nature~ of the assay makes it correct to
sum them). I can think of other natural data sets where
averaging would be the "natural" behavior. So you are right
that there is no "default" natural behavior, thus, my
suggestion to leave that to user choice via function argument
or class slot, defaulted to summing.
Actually in this case there ~is~ one behavior superior to
summing -- abstracting one of the data pair (that share indices)
into a second (very sparse) "overlay" matrix. Perhaps it is
my negligence not to have done this instead querying the list :-)
I am doing it now.
Regards,
-John Thaden
RK> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:
DB> Your matrix Mc should be flagged as invalid. Martin and I should
DB> discuss whether we want to add such a test to the validity method.
It
DB> is not difficult to add the test but there will be a penalty in
that
DB> it will slow down all operations on such matrices and I'm not sure
if
DB> we want to pay that price to catch a rather infrequently occuring
DB> problem.
RK> Elaborating the validity procedure to flag such instances seems
RK> to be well worth the speed penalty in my view. Of course,
RK> anticipating every such misstep imposes a heavy burden
RK> on developers and constitutes the real "cost" of more elaborate
RK> validity checking.
RK>
RK> [My 2cents based on experience with SparseM.]
Confidentiality Notice: This e-mail message, including any a...{{dropped}}
More information about the R-help
mailing list