[R] How important is set.seed
JH@rm@e @end|ng |rom roku@com
Tue Mar 22 15:20:19 CET 2022
Jeff Newmiller makes an interesting point about distributed processing, but I don�t know how to use the usual pseudo-random processes to obtain deterministic results when I don�t know how the data will be sharded. You might have to replace pseudo-random sampling with deterministic sampling using a hash of something involving the unique key. Then the selection of a salt is the equivalent of a call to set.seed in non-parallel processing. The results should be the same as long as you fix the data set & the salt, and then you can test sensitivity to changes in the salt.
From: Neha gupta <neha.bologna90 using gmail.com>
To: "Ebert,Timothy Aaron" <tebert using ufl.edu>
Cc: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>, "r-help using r-project.org"
<r-help using r-project.org>
Subject: Re: [R] How important is set.seed
<CA+nrPnurAqBUgbrP-Oq4a8eo4Y7CO-k5xfH8c3EK-DGNCscidw using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Thank you all.
Actually I need set.seed because I have to evaluate the consistency of
features selection generated by different models, so I think for this, it's
recommended to use the seed.
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
> If you are using the program for data analysis then set.seed() is not
> necessary unless you are developing a reproducible example. In a standard
> analysis it is mostly counter-productive because one should then ask if
> your presented results are an artifact of a specific seed that you selected
> to get a particular result. However, in cases where you need a reproducible
> example, debugging a program, or specific other cases where you might need
> the same result with every run of the program then set.seed() is an
> essential tool.
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jeff Newmiller
> Sent: Monday, March 21, 2022 8:41 PM
> To: r-help using r-project.org; Neha gupta <neha.bologna90 using gmail.com>; r-help
> mailing list <r-help using r-project.org>
> Subject: Re: [R] How important is set.seed
> [External Email]
> First off, "ML models" do not all use random numbers (for prediction I
> would guess very few of them do). Learn and pay attention to what the
> functions you are using do.
> Second, if you use random numbers properly and understand the precision
> that your specific use case offers, then you don't need to use set.seed.
> However, in practice, using set.seed can allow you to temporarily avoid
> chasing precision gremlins, or set up specific test cases for testing code,
> not results. It is your responsibility to not let this become a crutch... a
> randomized simulation that is actually sensitive to the seed is unlikely to
> offer an accurate result.
> Where to put set.seed depends a lot on how you are performing your
> simulations. In general each process should set it once uniquely at the
> beginning, and if you use parallel processing then use the features of your
> parallel processing framework to insure that this happens. Beware of
> setting all worker processes to use the same seed.
> On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 using gmail.com>
> >Hello everyone
> >I want to know
> >(1) In which cases, we need to use set.seed while building ML models?
> >(2) Which is the exact location we need to put the set.seed function i.e.
> >when we split data into train/test sets, or just before we train a model?
> >Thank you
> > [[alternative HTML version deleted]]
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >PLEASE do read the posting guide
> >and provide commented, minimal, self-contained, reproducible code.
> Sent from my phone. Please excuse my brevity.
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide https://urldefense.proofpoint.
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
Subject: Digest Footer
R-help using r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
End of R-help Digest, Vol 229, Issue 20
[[alternative HTML version deleted]]
More information about the R-help