[Rd] The function cummax() seems to have a bug.
Henrik Bengtsson
henrik.bengtsson at ucsf.edu
Mon May 18 02:22:45 CEST 2015
Below is some further troubleshooting on this:
>From code inspection this bug happens for only:
* for integer values
* when the first element is NA_integer_ and the second is not.
Examples:
# Numeric/doubles works as expected
> cummax(c(NA_real_, 0, 1, 2, 3))
[1] NA NA NA NA NA
# It does not occur when the first value is non-NA
> cummax(c(0L, NA_integer_, 1L, 2L, 3L))
[1] 0 NA NA NA NA
# When first value is NA, it is not "remembered"
# (because internal for loop starts with 2nd element)
> cummax(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA 0 1 2 3
The problem is not there for cummin():
> cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1] 0 NA NA NA NA
> cummin(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA NA NA NA NA
but that is just "pure luck" due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.
LibExtern int R_NaInt; /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER R_NaInt
Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148
The simple solution is probably to do (cf. native icumsum):
[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
===================================================================
--- src/main/cum.c (revision 68378)
+++ src/main/cum.c (working copy)
@@ -130,7 +130,7 @@
int *ix = INTEGER(x), *is = INTEGER(s);
int max = ix[0];
is[0] = max;
- for (R_xlen_t i = 1 ; i < xlength(x) ; i++) {
+ for (R_xlen_t i = 0 ; i < xlength(x) ; i++) {
if(ix[i] == NA_INTEGER) break;
is[i] = max = (max > ix[i]) ? max : ix[i];
}
@@ -142,7 +142,7 @@
int *ix = INTEGER(x), *is = INTEGER(s);
int min = ix[0];
is[0] = min;
- for (R_xlen_t i = 1 ; i < xlength(x) ; i++ ) {
+ for (R_xlen_t i = 0 ; i < xlength(x) ; i++ ) {
if(ix[i] == NA_INTEGER) break;
is[i] = min = (min < ix[i]) ? min : ix[i];
}
/Henrik
On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang <dongcan.jiang at gmail.com> wrote:
> Hi,
>
> The function cummax() seems to have a bug.
>
>> x <- c(NA, 0)
>> storage.mode(x) <- "integer"
>> cummax(x)
> [1] NA 0
>
> The correct result of this case should be NA NA. The mistake in
> [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
> the reason.
>
> Best Regards,
> Dongcan
>
> --
> Dongcan Jiang
> Team of Search Engine & Web Mining
> School of Electronic Engineering & Computer Science
> Peking University, Beijing, 100871, P.R.China
More information about the R-devel
mailing list