[Rd] rowSums()/colSums() don't preserve the 'integer' storage mode
Henrik Bengtsson
hb at stat.berkeley.edu
Fri Jul 18 03:42:22 CEST 2008
Hi,
there is one more thing to consider: the risk of getting integer
overflow. Should that be ignored? How is this handled by sum()?
I think it is good if {col|row}Sums() would return the same data type
as the input object.
Cheers
Henrik
On Thu, Jul 17, 2008 at 6:32 PM, Herve Pages <hpages at fhcrc.org> wrote:
> Hi Bill,
>
> Bill.Venables at csiro.au wrote:
>>
>> I don't see the cost of doing so paying off.
>
> The cost seems rather small. For the "if" block that is tagged /* columns */
> in the do_colsum() function (src/main/array.c), that would give something
> like:
>
> if (OP == 0 || OP == 1) { /* columns */
> int ans_type = (type == REALSXP || OP == 1) ? REALSXP : INTSXP;
> int ina = type == INTSXP ? NA_INTEGER : NA_LOGICAL;
> PROTECT(ans = allocVector(ans_type, p));
> for (j = 0; j < p; j++) {
> double rsum = 0.0;
> int isum = 0;
> switch (type) {
> case REALSXP:
> rx = REAL(x) + n*j;
> for (i = cnt = 0; i < n; i++, rx++) {
> if (!ISNAN(*rx)) {cnt++; rsum += *rx;}
> else if (keepNA) {rsum = NA_REAL; break;}
> }
> if (OP == 1) {
> if (cnt == 0 || ISNAN(rsum))
> REAL(ans)[j] = NA_REAL;
> else
> REAL(ans)[j] = rsum / cnt;
> } else
> REAL(ans)[j] = rsum;
> break;
> case INTSXP: case LGLSXP:
> ix = (type == INTSXP ? INTEGER(x) : LOGICAL(x)) + n*j;
> for (i = cnt = 0; i < n; i++, ix++) {
> if (*ix != ina) {cnt++; isum += *ix;}
> else if (keepNA) {isum = NA_INTEGER; break;}
> }
> if (OP == 1) {
> if (cnt == 0 || isum == NA_INTEGER)
> REAL(ans)[j] = NA_REAL;
> else
> REAL(ans)[j] = ((double) isum) / cnt;
> } else
> INTEGER(ans)[j] = isum;
> break;
> default:
> /* we checked the type above, but be sure */
> UNIMPLEMENTED_TYPEt("do_colsum", type);
> }
> }
> }
>
> So now you have 42 lines instead of 37 in the current code.
> Then do something similar for the "if" block tagged /* rows */.
> Basically the new code would not be more complicated than the
> current code.
>
> In the end rowSums()/colSums() will do the _right_ thing i.e.
> they'll preserve the 'integer' storage mode, and, by doing so, will
> behave consistently with the sum() function.
>
> BTW since the LGLSXP type is supported then the "'x' must be numeric"
> error msg at the beginning of the do_colsum() function is inappropriate.
>
> Cheers,
> H.
>
>
>>
>> storage.mode is really only important if you are passing arguments to
>> compiled code.
>>
>> If you are passing to compiled code, you really need to ensure the
>> storage mode is what you think it is, anyway.
>>
>> Bill Venables.
>>
>>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Herve Pages
>> Sent: Thursday, 17 July 2008 3:48 PM
>> To: R-devel at r-project.org
>> Subject: [Rd] rowSums()/colSums() don't preserve the 'integer' storage
>> mode
>>
>> Hi,
>>
>> Wouldn't that make sense to have rowSums()/colSums() to preserve the
>> storage mode?
>>
>> m <- matrix(1:15, nrow=5)
>>
>> > storage.mode(m)
>> [1] "integer"
>>
>> > storage.mode(sum(m))
>> [1] "integer"
>>
>> > storage.mode(rowSums(m))
>> [1] "double" <------------------- surprising!
>>
>> Cheers,
>> H.
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list