[R] Random Seed Location
Gary Black
gwblack001 at sbcglobal.net
Tue Feb 27 00:25:56 CET 2018
Hi all,
For some odd reason when running naïve bayes, k-NN, etc., I get slightly
different results (e.g., error rates, classification probabilities) from run
to run even though I am using the same random seed.
Nothing else (input-wise) is changing, but my results are somewhat different
from run to run. The only randomness should be in the partitioning, and I
have set the seed before this point.
My question simply is: should the location of the set.seed command matter,
provided that it is applied before any commands which involve randomness
(such as partitioning)?
If you need to see the code, it is below:
Thank you,
Gary
A. Separate the original (in-sample) data from the new (out-of-sample)
data. Set a random seed.
> InvestTech <- as.data.frame(InvestTechRevised)
> outOfSample <- InvestTech[5001:nrow(InvestTech), ]
> InvestTech <- InvestTech[1:5000, ]
> set.seed(654321)
B. Install and load the caret, ggplot2 and e1071 packages.
> install.packages(caret)
> install.packages(ggplot2)
> install.packages(e1071)
> library(caret)
> library(ggplot2)
> library(e1071)
C. Bin the predictor variables with approximately equal counts using
the cut_number function from the ggplot2 package. We will use 20 bins.
> InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20)
> InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20)
> outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20)
> outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20)
D. Partition the original (in-sample) data into 60% training and 40%
validation sets.
> n <- nrow(InvestTech)
> train <- sample(1:n, size = 0.6 * n, replace = FALSE)
> InvestTechTrain <- InvestTech[train, ]
> InvestTechVal <- InvestTech[-train, ]
E. Use the naiveBayes function in the e1071 package to fit the model.
> model <- naiveBayes(`Purchase (1=yes, 0=no)` ~ ., data = InvestTechTrain)
> prob <- predict(model, newdata = InvestTechVal, type = raw)
> pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
F. Use the confusionMatrix function in the caret package to output the
confusion matrix.
> confMtr <- confusionMatrix(pred,unlist(InvestTechVal[, 3]),mode =
everything, positive = 1)
> accuracy <- confMtr$overall[1]
> valError <- 1 accuracy
> confMtr
G. Classify the 18 new (out-of-sample) readers using the following
code.
> prob <- predict(model, newdata = outOfSample, type = raw)
> pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
> cbind(pred, prob, outOfSample[, -3])
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the R-help
mailing list