[R] [External] Somewhat disconcerting behavior of seq.int()
Ebert,Timothy Aaron
tebert @end|ng |rom u||@edu
Tue May 3 18:16:55 CEST 2022
microbenchmark(sieve1(1e5),sieve2(1e5), times =50)
Unit: milliseconds
expr min lq mean median uq max neval cld
sieve1(1e+05) 6.863301 7.259901 10.89202 10.6819 13.7993 18.3648 50 a
sieve2(1e+05) 22.996701 28.284901 32.54760 30.1501 31.0805 166.4092 50 b
The difference is small but significant.
By setting "by" you are executing some different piece of code regardless of whether the step size equals the default. I tried by = TRUE, by=1L, and by=1.0 but all of those variants gave non-significant differences in execution time even with times=500.
Tim
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Avi Gross via R-help
Sent: Tuesday, May 3, 2022 11:47 AM
Cc: R-help using r-project.org
Subject: Re: [R] [External] Somewhat disconcerting behavior of seq.int()
[External Email]
I ahve no comment on what version of the software on what machine architecture might cause the issue mentioned with differences in speed.
I have a more general question about increments.
The issue here is about creating a sequence starting at some value and ending no later than another value with an optional step size for the increment that by default is exactly 1.
So how would you normally do this and does it matter?
In some languages like C, the increment can be done many ways including val++, or val += 1, or val = val +1 ...
Those could produce identical code or different code. The compiler may optimize it or go and evaluate things twice at times. You might have hardware that contains a registerthat has a rapid increment by 1 instruction built in but not for a decrement by one, let alone increment by 12.
And as has been discussed here, data types of various size ints (or float representations of an integer) may need all kinds of conversions to operate on the same playing field. Who hasn't used a language where a boolean value can be treated like a zero or a one but stored compactly and yet if it is added to a large integer, needs some conversions?
How many assumptions can we really make about how someone wrote a function we use? Is it possible that the programmer for something like seq.int() decided to optimize their code.perhaps written in a dialect of C, so that when the default for "by=" is not mentioned, it increments using val++ while if it is specified even as "by=1" it switches to incrementing using something like "val = val + by" which may turn out to be slower?
I end by saying I am NOT talking about the complaint here as it seems likely to be specific to their setup. But more globally, it is possible that choices in how something is programmed can affect the outcome and assuming it was done the way you would is not always warranted. Languages that are more flexible, such as interpreted languages that allow many kinds of polymorphism can be very nice but with lots of overhead as underneath it all, the deeper programming levels require precision and for each part to be exactly what fits at that point. You can choose to store your numbers in various size containers, but at some level they tend to e unpacked and converted before being handed to something in software or hardware that expects EXACTLY one way. You may save on storage but sacrifice speed or other things.
-----Original Message-----
From: Bert Gunter <bgunter.4567 using gmail.com>
To: Stephanie Evert <stefanML using collocations.de>
Cc: R-help Mailing List <R-help using r-project.org>
Sent: Tue, May 3, 2022 10:41 am
Subject: Re: [R] [External] Somewhat disconcerting behavior of seq.int()
Thank you. But the binary I installed *was* the "Apple silicon arm64 build, signed and notarized package."
Bert
On Tue, May 3, 2022 at 12:25 AM Stephanie Evert <stefanML using collocations.de> wrote:
>
>
>
> > On 3 May 2022, at 07:08, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> >> microbenchmark( v1 <- s1 %% 2, times = 50) ## floating point
> > Unit: milliseconds
> > expr min lq mean median uq max neval
> > v1 <- s1%%2 69.28204 69.60496 69.8957 69.81379 70.01729 71.36125 50
> >
> >> microbenchmark( v2 <- s2 %% 2L, times = 50) ## integer
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > v2 <- s2%%2L 166.626 167.042 172.7431 170.5215 177.667 194.334 50
> >
> > I have no idea why the big difference, but I am pretty sure it's way
> > beyond me. Maybe Mac gurus can figure it out. I may post this on
> > r-sig-mac to see.
>
> Very likely some inefficiency of the Intel emulator on your M1 mac. I can imagine it has to do with the substantial differences between Intel and Arm floating-point architectures.
>
> Why not try with a native M1 version of R?
>
> Best,
> Stephanie
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=GcWRKLnAFuhomCgVbp6K_6w4IZ6FKDQtwH9ziOpgA35oOVWpYhUruurscWYNqP1p&s=IV_pvMX6BFzDQ6XLe--bqFKiyaklMFk0vwnmDDVc42k&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=GcWRKLnAFuhomCgVbp6K_6w4IZ6FKDQtwH9ziOpgA35oOVWpYhUruurscWYNqP1p&s=DMG-KdLD8dCjjVX1iizZc-FiM8Cn8i1D-MwKt26sBNo&e=
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=GcWRKLnAFuhomCgVbp6K_6w4IZ6FKDQtwH9ziOpgA35oOVWpYhUruurscWYNqP1p&s=IV_pvMX6BFzDQ6XLe--bqFKiyaklMFk0vwnmDDVc42k&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=GcWRKLnAFuhomCgVbp6K_6w4IZ6FKDQtwH9ziOpgA35oOVWpYhUruurscWYNqP1p&s=DMG-KdLD8dCjjVX1iizZc-FiM8Cn8i1D-MwKt26sBNo&e=
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list