[R] [External] Somewhat disconcerting behavior of seq.int()

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue May 3 06:37:37 CEST 2022


Well, I'm on an M1 Mac, so that is certainly different than either of
your systems. I installed the precompiled binary, which may also have
something to do with it. Whether these make a difference I have no
clue.

However, the fact remains that the Help file *does* warn that the type
of the seq.int() value is essentially indeterminate, and when I
explicitly cast it to integer, all is well. So mea culpa.

I will fool around some tomorrow with more careful profiling to see if
I can learn anything, but the best I say at present is: it is what it
is. Unless, of course, someone provides an answer before then.

Bert Gunter


On Mon, May 2, 2022 at 8:53 PM <luke-tierney using uiowa.edu> wrote:
>
> Something is very different about your system. On my Linux system I get
>
> > microbenchmark(l1 <- sieve1(1e5), times =50)
> Unit: milliseconds
>                  expr     min       lq     mean   median       uq     max neval
>   l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085    50
> > microbenchmark(l2 <- sieve2(1e5), times =50)
> Unit: milliseconds
>                  expr      min       lq     mean   median      uq      max neval
>   l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338    50
>
> Similar on an Intel Mac.
>
> Best,
>
> luke
>
> On Tue, 3 May 2022, Bert Gunter wrote:
>
> > ** Disconcerting to me, anyway; perhaps not to others**
> > (Apologies if this has been discussed before. I was a bit nonplussed by
> > it, but maybe I'm just clueless.) Anyway:
> >
> > Here are two almost identical versions of the Sieve of Eratosthenes.
> > The difference between them is only in the call to seq.int() that is
> > highlighted
> >
> > sieve1 <- function(m){
> >   if(m < 2) return(NULL)
> >   a <- floor(sqrt(m))
> >   pr <- Recall(a)
> > ####################
> >   s <- seq.int(2, to = m) ## Only difference here
> > ######################
> >   for( i in pr) s <- s[as.logical(s %% i)]
> >   c(pr,s)
> > }
> >
> > sieve2 <- function(m){
> >   if(m < 2) return(NULL)
> >   a <- floor(sqrt(m))
> >   pr <- Recall(a)
> > ####################
> >   s <- seq.int(2, to = m, by =1) ## Only difference here
> > #######################
> >   for( i in pr) s <- s[as.logical(s %% i)]
> >   c(pr,s)
> > }
> >
> > However, execution time is *quite* different.
> >
> > library(microbenchmark)
> >
> >> microbenchmark(l1 <- sieve1(1e5), times =50)
> > Unit: milliseconds
> >                expr      min       lq     mean  median       uq      max
> > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
> > neval
> >    50
> >
> >> microbenchmark(l2 <- sieve2(1e5), times =50)
> > Unit: milliseconds
> >                expr      min      lq     mean   median       uq      max
> > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
> > neval
> >    50
> >
> > Now note that:
> >> identical(l1, l2)
> > [1] FALSE
> >
> > ## Because:
> >> str(l1)
> > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
> >
> >> str(l2)
> > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
> >
> > I therefore assume that seq.int(), an internal generic, is dispatching
> > to a method that uses integer arithmetic for sieve1 and floating point
> > for sieve2. Is this correct? If not, what do I fail to understand? And
> > is this indeed the source of the large difference in execution time?
> >
> > Further, ?seq.int says:
> > "The interpretation of the unnamed arguments of seq and seq.int is not
> > standard, and it is recommended always to name the arguments when
> > programming."
> >
> > The above suggests that maybe this advice should be qualified, and/or
> > adding some comments to the Help file regarding this behavior might be
> > useful to naïfs like me.
> >
> > In case it makes a difference (and it might!):
> >
> >> sessionInfo()
> > R version 4.2.0 (2022-04-22)
> > Platform: x86_64-apple-darwin17.0 (64-bit)
> > Running under: macOS Monterey 12.3.1
> >
> > Matrix products: default
> > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] microbenchmark_1.4.9
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.2.0 tools_4.2.0
> >
> >
> > Thanks for any enlightenment and again apologies if I am plowing old ground.
> >
> > Best to all,
> >
> > Bert Gunter
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-help mailing list