[R] Somewhat disconcerting behavior of seq.int()

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue May 3 03:45:40 CEST 2022


** Disconcerting to me, anyway; perhaps not to others**
(Apologies if this has been discussed before. I was a bit nonplussed by
it, but maybe I'm just clueless.) Anyway:

Here are two almost identical versions of the Sieve of Eratosthenes.
The difference between them is only in the call to seq.int() that is
highlighted

sieve1 <- function(m){
   if(m < 2) return(NULL)
   a <- floor(sqrt(m))
   pr <- Recall(a)
####################
   s <- seq.int(2, to = m) ## Only difference here
######################
   for( i in pr) s <- s[as.logical(s %% i)]
   c(pr,s)
}

sieve2 <- function(m){
   if(m < 2) return(NULL)
   a <- floor(sqrt(m))
   pr <- Recall(a)
####################
   s <- seq.int(2, to = m, by =1) ## Only difference here
#######################
   for( i in pr) s <- s[as.logical(s %% i)]
   c(pr,s)
}

However, execution time is *quite* different.

library(microbenchmark)

> microbenchmark(l1 <- sieve1(1e5), times =50)
Unit: milliseconds
                expr      min       lq     mean  median       uq      max
 l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
 neval
    50

> microbenchmark(l2 <- sieve2(1e5), times =50)
Unit: milliseconds
                expr      min      lq     mean   median       uq      max
 l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
 neval
    50

Now note that:
> identical(l1, l2)
[1] FALSE

## Because:
> str(l1)
 int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...

> str(l2)
 num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...

I therefore assume that seq.int(), an internal generic, is dispatching
to a method that uses integer arithmetic for sieve1 and floating point
for sieve2. Is this correct? If not, what do I fail to understand? And
is this indeed the source of the large difference in execution time?

Further, ?seq.int says:
"The interpretation of the unnamed arguments of seq and seq.int is not
standard, and it is recommended always to name the arguments when
programming."

The above suggests that maybe this advice should be qualified, and/or
adding some comments to the Help file regarding this behavior might be
useful to naïfs like me.

In case it makes a difference (and it might!):

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] microbenchmark_1.4.9

loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0


Thanks for any enlightenment and again apologies if I am plowing old ground.

Best to all,

Bert Gunter



More information about the R-help mailing list