[Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
Jake Elmstedt
j@ke@e|m@tedt @end|ng |rom gm@||@com
Fri Sep 17 23:57:18 CEST 2021
What about splitting the baby and having set.seed(1:2), set.seed(6.1),
etc. issue a warning rather than throw an error?
It informs the user that their expectations have deviated from
reality, encourages proper programming practices, and carries
substantially lower risk of breaking things than an exception.
On Fri, Sep 17, 2021 at 1:13 PM Avi Gross via R-devel
<r-devel using r-project.org> wrote:
>
> R wobbles a bit as there is no normal datatype that is a singleton variable. Saying x <- 5 just creates a vector of current length 1. It is perfectly legal to then write x [2] <- 6 and so on. The vector lengthens. You can truncate it back to 1, if you wish: length(x) <- 1
>
> So the question here is what happens if you supply more info than is needed? If it is an integer vector of length greater than one, should it ignore everything but the first entry? I note it happily accepts not-quite integers like TRUE and FALSE. it also accepts floating point numbers like 1.23 or 1.2e5.
>
> The goal seems to be to set a unique starting point, rounded or transformed if needed. The visible part of the function does not even look at the seed before calling the internal representation. So although superficially choosing the first integer in a vector makes some sense, it can be a problem if a program assumes the entire vector is consumed and perhaps hashed in some way to make a seed. If the program later changes parts of the vector other than the first entry, it may assume re-setting the seed gets something else and yet it may be exactly the same.
>
> So, yes, I suspect it is an ERROR to take anything that cannot be coerced by something like as.integer() into a vector of length 1.
>
> I have noted other places in R where I may get a warning when giving a longer vector that only the fist element will be used. Are they all problems that need to be addressed?
>
> Here is a short one:
>
> > x <- c(1:3)
> > if (x > 2) y <- TRUE
> Warning message:
> In if (x > 2) y <- TRUE :
> the condition has length > 1 and only the first element will be used
> > y
> Error: object 'y' not found
>
> The above is not vectorized and makes the choice of x==1 and thus does not set y.
>
> Now a vectorized variant works as expected, making a vector of length 3 for y:
>
> > x
> [1] 1 2 3
>
> > y <- ifelse(x > 2, TRUE, FALSE)
> > y
> [1] FALSE FALSE TRUE
>
> I have no doubt fixing lots of this stuff, if indeed it is a fix, can break lots of existing code. Sure, it is not harmful to ask a programmer to always say x[1] to guarantee they are getting what they want, or to add a function like first(x) that does the same.
>
> R has some compromises or features I sometimes wonder about. If it had a concept of a numeric scalar, then some things that now happen might start being an error.
>
> What happens when you multiply a vector by a scalar as in 5*x is that every component of x is multiplied by 5. but x*x does componentwise multiplication. So say x is c(1:3) what should this do using a twosome times a threesome?
>
> x[1:2]*x
> [1] 1 4 3
> Warning message:
> In x[1:2] * x :
> longer object length is not a multiple of shorter object length
>
> Is it recycling to get a 1 in pseudo-position 3?
>
> Yep, this shows recycling:
>
> > x[1:2]*x
> [1] 1 4 3 8 5 12 7 16 9
> Warning message:
> In x[1:2] * x :
> longer object length is not a multiple of shorter object length
>
> You do get a warning but not telling you what it did.
>
> In essence, the earlier case of 5*x arguably recycled the 5 as many times as needed but with no warning.
>
> My point is that many languages, especially older ones, were designed a certain way and have been updated but we may be stuck with what we have. A brand new language might come up with a new way that includes vectorizing the heck out of things but allowing and even demanding that you explicitly convert things to a scalar in a context that needs it or to explicitly asking for recycling when you want it or ...
>
>
>
>
> -----Original Message-----
> From: R-devel <r-devel-bounces using r-project.org> On Behalf Of Henrik Bengtsson
> Sent: Friday, September 17, 2021 8:39 AM
> To: GILLIBERT, Andre <Andre.Gillibert using chu-rouen.fr>
> Cc: R-devel <r-devel using r-project.org>
> Subject: Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
>
> > I’m curious, other than proper programming practice, why?
>
> Life's too short for troubleshooting silent mistakes - mine or others.
>
> While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(<double>), e.g.
>
> > set.seed(6.1); sum(.Random.seed)
> [1] 73930104
> > set.seed(6.2); sum(.Random.seed)
> [1] 73930104
>
> which clearly is not what the user expected. There are also a few cases of set.seed(<character>), e.g.
>
> > set.seed("42"); sum(.Random.seed)
> [1] -2119381568
> > set.seed(42); sum(.Random.seed)
> [1] -2119381568
>
> which works just because as.numeric("42") is used.
>
> /Henrik
>
> On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre <Andre.Gillibert using chu-rouen.fr> wrote:
> >
> > Hello,
> >
> > A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed.
> >
> > The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed().
> >
> > Validating inputs and emitting errors early is a good practice.
> >
> > Just my 2 cents.
> >
> > Sincerely.
> > Andre GILLIBERT
> >
> > -----Message d'origine-----
> > De : R-devel [mailto:r-devel-bounces using r-project.org] De la part de
> > Avraham Adler Envoyé : vendredi 17 septembre 2021 12:07 À : Henrik
> > Bengtsson Cc : R-devel Objet : Re: [Rd] WISH: set.seed(seed) to
> > produce error if length(seed) != 1 (now silent)
> >
> > Hi, Henrik.
> >
> > I’m curious, other than proper programming practice, why?
> >
> > Avi
> >
> > On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson <
> > henrik.bengtsson using gmail.com> wrote:
> >
> > > Hi,
> > >
> > > according to help("set.seed"), argument 'seed' to set.seed() should be:
> > >
> > > a single value, interpreted as an integer, or NULL (see ‘Details’).
> > >
> > > From code inspection (src/main/RNG.c) and testing, it turns out that
> > > if you pass a 'seed' with length greater than one, it silently uses
> > > seed[1], e.g.
> > >
> > > > set.seed(1); sum(.Random.seed)
> > > [1] 4070365163
> > > > set.seed(1:3); sum(.Random.seed)
> > > [1] 4070365163
> > > > set.seed(1:100); sum(.Random.seed)
> > > [1] 4070365163
> > >
> > > I'd like to suggest that set.seed() produces an error if
> > > length(seed)
> > > > 1. As a reference, for length(seed) == 0, we get:
> > >
> > > > set.seed(integer(0))
> > > Error in set.seed(integer(0)) : supplied seed is not a valid integer
> > >
> > > /Henrik
> > >
> > > ______________________________________________
> > > R-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > --
> > Sent from Gmail Mobile
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list