[R] Ranger could not work with caret
Neha gupta
neh@@bo|ogn@90 @end|ng |rom gm@||@com
Fri Jul 1 21:18:54 CEST 2022
@Rui Barradas <ruipbarradas using sapo.pt>
Thank you again for the useful explanation.
Best regards
On Fri, Jul 1, 2022 at 8:26 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> Hello,
>
> The error doesn't arise in randomForest because rf has a function tuneRF
> that looks for the best mtry (best relative to OOB error estimate). And
> it's this value that it uses.
>
> The question's code gives Ranger errors but it also gives R warnings:
>
> Warning messages:
> 1: model fit failed for Fold01: mtry=48, min.node.size=5,
> splitrule=variance Error in ranger::ranger(dependent.variable.name =
> ".outcome", data = x, :
> User interrupt or internal error.
>
>
> As you can see, mtry=48 is the double of ncol(tr) when should *never* be
> greater than the number of variables in the data set. Why it is using
> this value, I don't know. Function bug? Ask the package maintainer?
>
> And, by the way, package caret does or can do a grid search for optimal
> parameter values. If that is giving errors and you are calling rf
> directly why bother whith caret's error? Use the original function. Here
> is an example with tuneRF. Setting argument doBest to TRUE you'll have
> both the optimal value for mtry and the fitted random forest. 2 in 1.
>
>
> library(randomForest)
> # randomForest 4.7-1.1
> # Type rfNews() to see new features/changes/bug fixes.
>
> c2 <- tuneRF(
> x = tr[-ncol(tr)],
> y = tr$act_effort,
> mtryStart = ncol(tr)/2,
> doBest = TRUE
> )
> # mtry = 12 OOB error = 139920.7
> # Searching left ...
> # mtry = 6 OOB error = 170909.3
> # -0.2214729 0.05
> # Searching right ...
> # mtry = 23 OOB error = 128566.7
> # 0.08114586 0.05
>
> c2
> #
> # Call:
> # randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1])
> # Type of random forest: regression
> # Number of trees: 500
> # No. of variables tried at each split: 23
> #
> # Mean of squared residuals: 129734.8
> # % Var explained: 39.98
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Às 17:18 de 01/07/2022, Neha gupta escreveu:
> > Thank you so much for your help. I hope it will work.
> >
> > However, why the same error doesn't arise when I am using rf. They both
> > have the same parameters and it's default values.
> >
> > Best regards
> >
> > On Friday, July 1, 2022, Rui Barradas <ruipbarradas using sapo.pt
> > <mailto:ruipbarradas using sapo.pt>> wrote:
> >
> > Hello,
> >
> > The error is in Ranger parameter mtry becoming greater than the
> > number of variables (columns).
> > mtry can be set manually in caret::train argument tuneGrid. But for
> > random forests you must also set the split rule and the minimum node.
> >
> >
> > library(caret)
> > library(farff)
> >
> > boot <- trainControl(method = "cv", number = 10)
> >
> > # set the maximum mtry manually to ncol(tr)
> > # this creates a sequence of mtry values
> > mtry <- var_seq(ncol(tr), len = 3) # 3 is the default value
> > mtry
> > # [1] 2 13 24
> > #[1] 2 13 24
> >
> > splitrule <- c("variance", "extratrees")
> > min.node.size <- 1:10
> > mtrygrid <- expand.grid(mtry, splitrule, min.node.size)
> > names(mtrygrid) <- c("mtry", "splitrule", "min.node.size")
> >
> > c1 <- train(act_effort ~ ., data = tr,
> > method = "ranger",
> > tuneLength = 5,
> > metric = "MAE",
> > preProc = c("center", "scale", "nzv"),
> > tuneGrid = mtrygrid,
> > trControl = boot)
> > c1
> > # Random Forest
> > #
> > # 30 samples
> > # 23 predictors
> > #
> > # Pre-processing: centered (48), scaled (48), remove (58)
> > # Resampling: Cross-Validated (10 fold)
> > # Summary of sample sizes: 28, 27, 27, 28, 27, 27, ...
> > # Resampling results across tuning parameters:
> > #
> > # mtry splitrule min.node.size RMSE Rsquared MAE
> > # 2 variance 1 256.6391 0.8103759 186.3609
> > # 2 variance 2 249.7120 0.8628109 183.6696
> > # 2 variance 3 258.8240 0.8284449 189.0712
> > #
> > # [...omit...]
> > #
> > # 13 extratrees 10 254.9569 0.8918014 191.2524
> > # 24 variance 1 177.7188 0.9458652 112.2800
> > # 24 variance 2 172.6826 0.9204287 108.5943
> > # 24 variance 3 172.9954 0.9271006 109.2554
> > # 24 variance 4 172.2467 0.9523067 110.0776
> > # 24 variance 5 175.2485 0.9283317 112.8798
> > # 24 variance 6 177.9285 0.9369881 115.8970
> > # 24 variance 7 180.5959 0.9485035 117.5816
> > # 24 variance 8 178.8037 0.9358033 117.8725
> > # 24 variance 9 176.5849 0.9210959 117.0055
> > # 24 variance 10 178.6439 0.9257969 119.8035
> > # 24 extratrees 1 219.1368 0.8801770 141.0720
> > # 24 extratrees 2 216.1900 0.8550002 140.9263
> > # 24 extratrees 3 212.4138 0.8979379 141.4282
> > # 24 extratrees 4 218.2631 0.9121471 146.2908
> > # 24 extratrees 5 212.5679 0.9279598 144.2715
> > # 24 extratrees 6 218.9856 0.9141754 152.2099
> > # 24 extratrees 7 222.8540 0.9412682 152.4614
> > # 24 extratrees 8 228.1156 0.9423414 161.8456
> > # 24 extratrees 9 226.6182 0.9408306 160.5264
> > # 24 extratrees 10 226.9280 0.9429413 165.6878
> > #
> > # MAE was used to select the optimal model using the smallest value.
> > # The final values used for the model were mtry = 24, splitrule =
> > variance
> > # and min.node.size = 2.
> > plot(c1)
> >
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > Às 23:03 de 30/06/2022, Neha gupta escreveu:
> >
> > Ok, the data is pasted below
> >
> > But on the same data (everything the same) and with other models
> > like RF, SVM etc, it works fine.
> >
> > > dput(head(tr, 30))
> > structure(list(recordnumber = c(0, 0.02, 0.04, 0.06, 0.07, 0.08,
> > 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.16, 0.17, 0.18, 0.23, 0.24,
> > 0.25, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.35, 0.36, 0.37, 0.38,
> > 0.4, 0.41), projectname = structure(c(1L, 1L, 1L, 1L, 2L, 3L,
> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> > 4L, 4L, 4L, 4L, 4L, 4L, 5L, 6L), levels = c("de", "erb", "gal",
> > "X", "hst", "slp", "spl", "Y"), class = "factor"), cat2 =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
> > 9L, 11L, 5L, 4L, 6L, 8L, 3L, 9L, 9L, 9L, 9L, 6L, 7L), levels =
> > c("Avionics",
> > "application_ground", "avionicsmonitoring",
> "batchdataprocessing",
> > "communications", "datacapture", "launchprocessing",
> > "missionplanning",
> > "monitor_control", "operatingsystem", "realdataprocessing",
> > "science",
> > "simulation", "utility"), class = "factor"), forg =
> structure(c(2L,
> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels =
> c("f",
> > "g"), class = "factor"), center = structure(c(2L, 2L, 2L, 2L,
> > 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L), levels = c("1", "2",
> > "3", "4", "5", "6"), class = "factor"), year = c(0.5, 0.5, 0.5,
> > 0.5, 0.6875, 0.5625, 0.5625, 0.8125, 0.5625, 0.875, 0.5625, 0.75,
> > 0.5625, 0.8125, 0.75, 0.9375, 0.9375, 0.9375, 0.6875, 0.6875,
> > 0.6875, 0.6875, 0.875, 1, 0.9375, 0.9375, 0.9375, 0.9375, 0.5625,
> > 0.25), mode = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 3L, 3L, 3L, 3L), levels = c("embedded", "organic",
> > "semidetached"
> > ), class = "factor"), rely = structure(c(4L, 4L, 4L, 4L, 4L,
> > 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 3L,
> > 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L), levels = c("vl", "l", "n",
> > "h", "vh", "xh"), class = "factor"), data = structure(c(2L, 2L,
> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> > 5L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels = c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), cplx =
> > structure(c(4L,
> > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L,
> > 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), time =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L,
> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), stor =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L,
> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), virt =
> > structure(c(2L,
> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 3L,
> > 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), turn =
> > structure(c(2L,
> > 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
> > 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), acap =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L,
> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), aexp =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L, 4L, 4L,
> > 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), pcap =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 4L, 5L, 4L, 5L, 3L, 4L, 4L, 5L, 4L, 4L, 4L, 4L,
> > 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 4L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), vexp =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), lexp =
> > structure(c(4L,
> > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 4L, 4L, 4L, 4L, 3L, 3L,
> > 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), modp =
> > structure(c(4L,
> > 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 4L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), tool =
> > structure(c(3L,
> > 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 1L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), sced =
> > structure(c(2L,
> > 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> > 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L), levels =
> > c("vl",
> > "l", "n", "h", "vh", "xh"), class = "factor"), equivphyskloc =
> > c(0.025534,
> > 0.006945, 0.008988, 0.002655, 0.067102, 0.006741, 0.019508,
> > 0.005209,
> > 0.101215, 0.010622, 0.101215, 0.019508, 0.152283, 0.031253,
> > 0.014401,
> > 0.014401, 0.037892, 0.009294, 0.015729, 0.012154, 0.032377,
> > 0.035339,
> > 0.004698, 0.009703, 0.00572, 0.012358, 0.091002, 0.007252,
> 0.180778,
> > 0.307527), act_effort = c(117.6, 31.2, 25.2, 10.8, 352.8, 72,
> > 72, 24, 360, 36, 215, 48, 324, 60, 48, 90, 210, 48, 82, 62, 170,
> > 192, 18, 50, 42, 60, 444, 42, 1248, 2400)), row.names = c(1L,
> > 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L, 18L, 19L,
> > 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 36L, 37L, 38L, 39L,
> > 41L, 42L), class = "data.frame")
> >
> >
> >
> > On Thu, Jun 30, 2022 at 11:28 PM Rui Barradas
> > <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>
> > <mailto:ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>>
> wrote:
> >
> > Hello,
> >
> > Please post data in dput format, without it it's difficult
> > to tell.
> > If I substitute
> >
> > mpg for act_effort
> > mtcars for tr
> >
> > keeping everything else, I don't get any errors.
> > And the error message says clearly that the error is in tr
> > (data).
> >
> > Can you post the output of dput(head(tr, 30))?
> >
> > Rui Barradas
> >
> >
> > Às 19:32 de 30/06/2022, Neha gupta escreveu:
> > > I posted it for the second time as I didn't get any
> > response from
> > group
> > > members. I am not sure if some problem is with the
> question.
> > >
> > >
> > >
> > > I cannot run the "ranger" model with caret. I am only
> > using the
> > farff and
> > > caret libraries and the following code:
> > >
> > > boot <- trainControl(method = "cv", number=10)
> > >
> > > c1 <-train(act_effort ~ ., data = tr,
> > > method = "ranger",
> > > tuneLength = 5,
> > > metric = "MAE",
> > > preProc = c("center", "scale", "nzv"),
> > > trControl = boot)
> > >
> > > The error I get is the repeating of the following
> > message until I
> > interrupt
> > > it.
> > >
> > > Error: mtry can not be larger than number of variables
> > in data.
> > Ranger will
> > > EXIT now.
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org <mailto:R-help using r-project.org>
> > <mailto:R-help using r-project.org <mailto:R-help using r-project.org>>
> > mailing list
> > -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > <https://stat.ethz.ch/mailman/listinfo/r-help>
> > <https://stat.ethz.ch/mailman/listinfo/r-help
> > <https://stat.ethz.ch/mailman/listinfo/r-help>>
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > <http://www.R-project.org/posting-guide.html>
> > <http://www.R-project.org/posting-guide.html
> > <http://www.R-project.org/posting-guide.html>>
> > > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list