[R] How important is set.seed

Neha gupta neh@@bo|ogn@90 @end|ng |rom gm@||@com
Tue Mar 22 17:03:21 CET 2022


Thank you again Tim

d=readARFF("my data")

set.seed(123)

tr <- d[index, ]
ts <- d[-index, ]


ctrl <- trainControl(method = "repeatedcv",number=10)

set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
                     method = "mlp",
                       tuneLength = 30,
                     metric = "MAE",
                     preProc = c("center", "scale", "nzv"),
                     trControl = ctrl)
getTrainPerf(ran_search)


Would it be good?

On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> My inclination is to follow Jeff’s advice and put it at the beginning of
> the program.
>
> You can always experiment:
>
>
>
> set.seed(42)
>
> rnorm(5,5,5)
>
> rnorm(5,5,5)
>
> runif(5,0,3)
>
>
>
> As long as the commands are executed in the order they are written, then
> the outcome is the same every time. Set seed is giving you reproducible
> outcomes. However, the second rnorm() does not give you the same outcome as
> the first. So set seed starts at the same point but if you want the first
> and second rnorm() call to give the same results you will need another
> set.seed(42).
>
>
>
> Note also, that it does not matter if you pause: run the above code as a
> chunk, or run each command individually you get the same result (as long as
> you do it in the sequence written). So, if you set seed, run some code,
> take a break, come back write some more code you  might get in trouble
> because R is still using the original set.seed() command.
>
> To solve this issue use
>
> set.seed(Sys.time())
>
>
>
> Or
>
>
>
> set.seed(NULL)
>
>
>
> Some of this is just good programming style workflow:
>
>
>
> Import data
>
> Declare variables and constants (set.seed() typically goes here)
>
> Define functions
>
> Body of code
>
> Generate output
>
> Clean up ( set.seed(NULL) would go here, along with removing unused
> variables and such)
>
>
>
> Regards,
>
> Tim
>
>
>
> *From:* Neha gupta <neha.bologna90 using gmail.com>
> *Sent:* Tuesday, March 22, 2022 10:48 AM
> *To:* Ebert,Timothy Aaron <tebert using ufl.edu>
> *Cc:* Jeff Newmiller <jdnewmil using dcn.davis.ca.us>; r-help using r-project.org
> *Subject:* Re: How important is set.seed
>
>
>
> *[External Email]*
>
>
> Hello Tim
>
>
>
> In some of the examples I see in the tutorials, they put the random seed
> just before the model training e.g train function in case of caret library.
> Should I follow this?
>
>
>
> Best regards
> On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> Ah, so maybe what you need is to think of “set.seed()” as a treatment in
> an experiment. You could use a random number generator to select an
> appropriate number of seeds, then use those seeds repeatedly in the
> different models to see how seed selection influences outcomes. I am not
> quite sure how many seeds would constitute a good sample. For me that would
> depend on what I find and how long a run takes.
>
>   In parallel processing you set seed in master and then use a random
> number generator to set seeds in each worker.
>
> Tim
>
>
>
> *From:* Neha gupta <neha.bologna90 using gmail.com>
> *Sent:* Tuesday, March 22, 2022 6:33 AM
> *To:* Ebert,Timothy Aaron <tebert using ufl.edu>
> *Cc:* Jeff Newmiller <jdnewmil using dcn.davis.ca.us>; r-help using r-project.org
> *Subject:* Re: How important is set.seed
>
>
>
> *[External Email]*
>
> Thank you all.
>
>
>
> Actually I need set.seed because I have to evaluate the consistency of
> features selection generated by different models, so I think for this, it's
> recommended to use the seed.
>
>
>
> Warm regards
>
> On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> If you are using the program for data analysis then set.seed() is not
> necessary unless you are developing a reproducible example. In a standard
> analysis it is mostly counter-productive because one should then ask if
> your presented results are an artifact of a specific seed that you selected
> to get a particular result. However, in cases where you need a reproducible
> example, debugging a program, or specific other cases where you might need
> the same result with every run of the program then set.seed() is an
> essential tool.
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jeff Newmiller
> Sent: Monday, March 21, 2022 8:41 PM
> To: r-help using r-project.org; Neha gupta <neha.bologna90 using gmail.com>; r-help
> mailing list <r-help using r-project.org>
> Subject: Re: [R] How important is set.seed
>
> [External Email]
>
> First off, "ML models" do not all use random numbers (for prediction I
> would guess very few of them do). Learn and pay attention to what the
> functions you are using do.
>
> Second, if you use random numbers properly and understand the precision
> that your specific use case offers, then you don't need to use set.seed.
> However, in practice, using set.seed can allow you to temporarily avoid
> chasing precision gremlins, or set up specific test cases for testing code,
> not results. It is your responsibility to not let this become a crutch... a
> randomized simulation that is actually sensitive to the seed is unlikely to
> offer an accurate result.
>
> Where to put set.seed depends a lot on how you are performing your
> simulations. In general each process should set it once uniquely at the
> beginning, and if you use parallel processing then use the features of your
> parallel processing framework to insure that this happens. Beware of
> setting all worker processes to use the same seed.
>
> On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 using gmail.com>
> wrote:
> >Hello everyone
> >
> >I want to know
> >
> >(1) In which cases, we need to use set.seed while building ML models?
> >
> >(2) Which is the exact location we need to put the set.seed function i.e.
> >when we split data into train/test sets, or just before we train a model?
> >
> >Thank you
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
> >an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
> >sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
> >PLEASE do read the posting guide
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
> >_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
> >zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
> PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list