[Rd] sequence(c(2, 0, 3)) produces surprising results, would (PR#9813)

bill at insightful.com bill at insightful.com
Thu Jul 26 20:38:41 CEST 2007


On Thu, 26 Jul 2007 bill at insightful.com wrote:

> Full_Name: Bill Dunlap
> Version: 2.5.0
> OS: Linux
> Submission from: (NULL) (70.98.76.47)
>
> sequence(nvec) is documented to return
> the concatenation of seq(nvec[i]), for
> i in seq(along=nvec).  This produces inconvenient
> (for me) results for 0 inputs.
>     > sequence(c(2,0,3)) # would like 1 2 1 2 3, ignore 0
>     [1] 1 2 1 0 1 2 3
> Would changing sequence(nvec) to use seq_len(nvec[i])
> instead of the current 1:nvec[i] break much existing code?
>
> On the other hand, almost no one seems to use sequence()
> and it might make more sense to allow seq_len() and seq()
> to accept a vector for length.out and they would return a
> vector of length sum(length.out),
>     c(seq_len(length.out[1]), seq_len(length.out[2]), ...)

seq_len() could be changed to do that with the following
code change.  It does slow down seq_len in the scalar case
                             old time    new time
for(i in 1:1e6)seq_len(2)    1.251       1.516
for(i in 1:1e6)seq_len(20)   1.690       1.990
for(i in 1:1e6)seq_len(200)  5.480       5.860

It becomes much faster than sequence in the vectorized case.
   > unix.time(for(i in 1:1e4)sequence(20:1))
      user  system elapsed
     1.550   0.000   1.557
   > unix.time(for(i in 1:1e4)seq_len(20:1))
      user  system elapsed
     0.070   0.000   0.066
   > identical(sequence(20:1), seq_len(20:1))
   [1] TRUE
My problem cases are where the length.out vector is long
and contains small integers (e.g., the output of table
on a vector of mostly unique values).

Index: src/main/seq.c
===================================================================
--- src/main/seq.c	(revision 42329)
+++ src/main/seq.c	(working copy)
@@ -594,16 +594,31 @@

 SEXP attribute_hidden do_seq_len(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
-    SEXP ans;
-    int i, len, *p;
+    SEXP ans, slengths;
+    int i, *p, anslen, *lens, nlens, ilen, nprotected=0 ;

     checkArity(op, args);
-    len = asInteger(CAR(args));
-    if(len == NA_INTEGER || len < 0)
-	errorcall(call, _("argument must be non-negative"));
-    ans = allocVector(INTSXP, len);
+    slengths = CAR(args);
+    if (TYPEOF(slengths) != INTSXP) {
+    	PROTECT(slengths = coerceVector(CAR(args), INTSXP));
+        nprotected++;
+    }
+    lens = INTEGER(slengths);
+    nlens = LENGTH(slengths);
+    anslen = 0 ;
+    for(ilen=0;ilen<nlens;ilen++) {
+        int len = lens[ilen] ;
+        if(len == NA_INTEGER || len < 0)
+	    errorcall(call, _("argument must be non-negative"));
+        anslen += len ;
+    }
+    ans = allocVector(INTSXP, anslen);
     p = INTEGER(ans);
-    for(i = 0; i < len; i++) p[i] = i+1;
-
+    for(ilen=0;ilen<nlens;ilen++) {
+        int len = lens[ilen] ;
+        for(i = 0; i < len; i++) *p++ = i+1;
+    }
+    if(nprotected>0)
+        UNPROTECT(nprotected);
     return ans;
 }



More information about the R-devel mailing list