[R] Random Seed Location
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Tue Feb 27 02:56:37 CET 2018
I am willing to go out on that limb and say the answer to the OP question is yes, the RN sequence in R should be reproducible. I agree though that it doesn't look like he is actually taking care not to run code that would disturb the generator.
--
Sent from my phone. Please excuse my brevity.
On February 26, 2018 4:30:47 PM PST, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>In case you don't get an answer from someone more knowledgeable:
>
>1. I don't know.
>2. But it is possible that other packages that are loaded after
>set.seed()
>fool with the RNG.
>3. So I would call set.seed just before you invoke each random number
>generation to be safe.
>
>Cheers,
>Bert
>
>
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and
>sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack001 at sbcglobal.net>
>wrote:
>
>> Hi all,
>>
>> For some odd reason when running naïve bayes, k-NN, etc., I get
>slightly
>> different results (e.g., error rates, classification probabilities)
>from
>> run
>> to run even though I am using the same random seed.
>>
>> Nothing else (input-wise) is changing, but my results are somewhat
>> different
>> from run to run. The only randomness should be in the partitioning,
>and I
>> have set the seed before this point.
>>
>> My question simply is: should the location of the set.seed command
>matter,
>> provided that it is applied before any commands which involve
>randomness
>> (such as partitioning)?
>>
>> If you need to see the code, it is below:
>>
>> Thank you,
>> Gary
>>
>>
>> A. Separate the original (in-sample) data from the new
>(out-of-sample)
>> data. Set a random seed.
>>
>> > InvestTech <- as.data.frame(InvestTechRevised)
>> > outOfSample <- InvestTech[5001:nrow(InvestTech), ]
>> > InvestTech <- InvestTech[1:5000, ]
>> > set.seed(654321)
>>
>> B. Install and load the caret, ggplot2 and e1071 packages.
>>
>> > install.packages(“caret”)
>> > install.packages(“ggplot2”)
>> > install.packages(“e1071”)
>> > library(caret)
>> > library(ggplot2)
>> > library(e1071)
>>
>> C. Bin the predictor variables with approximately equal counts
>using
>> the cut_number function from the ggplot2 package. We will use 20
>bins.
>>
>> > InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20)
>> > InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20)
>> > outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20)
>> > outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20)
>>
>> D. Partition the original (in-sample) data into 60% training and
>40%
>> validation sets.
>>
>> > n <- nrow(InvestTech)
>> > train <- sample(1:n, size = 0.6 * n, replace = FALSE)
>> > InvestTechTrain <- InvestTech[train, ]
>> > InvestTechVal <- InvestTech[-train, ]
>>
>> E. Use the naiveBayes function in the e1071 package to fit the
>model.
>>
>> > model <- naiveBayes(`Purchase (1=yes, 0=no)` ~ ., data =
>InvestTechTrain)
>> > prob <- predict(model, newdata = InvestTechVal, type = “raw”)
>> > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
>>
>> F. Use the confusionMatrix function in the caret package to
>output the
>> confusion matrix.
>>
>> > confMtr <- confusionMatrix(pred,unlist(InvestTechVal[, 3]),mode =
>> “everything”, positive = “1”)
>> > accuracy <- confMtr$overall[1]
>> > valError <- 1 – accuracy
>> > confMtr
>>
>> G. Classify the 18 new (out-of-sample) readers using the
>following
>> code.
>> > prob <- predict(model, newdata = outOfSample, type = “raw”)
>> > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
>> > cbind(pred, prob, outOfSample[, -3])
>>
>>
>>
>>
>>
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list