[R] How important is set.seed
tebert @end|ng |rom u||@edu
Tue Mar 22 13:15:01 CET 2022
Ah, so maybe what you need is to think of “set.seed()” as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes.
In parallel processing you set seed in master and then use a random number generator to set seeds in each worker.
From: Neha gupta <neha.bologna90 using gmail.com>
Sent: Tuesday, March 22, 2022 6:33 AM
To: Ebert,Timothy Aaron <tebert using ufl.edu>
Cc: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>; r-help using r-project.org
Subject: Re: How important is set.seed
Thank you all.
Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed.
On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu<mailto:tebert using ufl.edu>> wrote:
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool.
From: R-help <r-help-bounces using r-project.org<mailto:r-help-bounces using r-project.org>> On Behalf Of Jeff Newmiller
Sent: Monday, March 21, 2022 8:41 PM
To: r-help using r-project.org<mailto:r-help using r-project.org>; Neha gupta <neha.bologna90 using gmail.com<mailto:neha.bologna90 using gmail.com>>; r-help mailing list <r-help using r-project.org<mailto:r-help using r-project.org>>
Subject: Re: [R] How important is set.seed
First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do.
Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result.
Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed.
On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 using gmail.com<mailto:neha.bologna90 using gmail.com>> wrote:
>I want to know
>(1) In which cases, we need to use set.seed while building ML models?
>(2) Which is the exact location we need to put the set.seed function i.e.
>when we split data into train/test sets, or just before we train a model?
> [[alternative HTML version deleted]]
>R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help