[R] Ranger could not work with caret

Fri Jul 1 18:18:36 CEST 2022

Thank you so much for your help. I hope it will work.

However, why the same error doesn't arise when I am using rf. They both
have the same parameters and it's default values.

Best regards

On Friday, July 1, 2022, Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> The error is in Ranger parameter mtry becoming greater than the number of
> variables (columns).
> mtry can be set manually in caret::train argument tuneGrid. But for random
> forests you must also set the split rule and the minimum node.
>
>
> library(caret)
> library(farff)
>
> boot <- trainControl(method = "cv", number = 10)
>
> # set the maximum mtry manually to ncol(tr)
> # this creates a sequence of mtry values
> mtry <- var_seq(ncol(tr), len = 3)  # 3 is the default value
> mtry
> #  [1]  2 13 24
> #[1]  2 13 24
>
> splitrule <- c("variance", "extratrees")
> min.node.size <- 1:10
> mtrygrid <- expand.grid(mtry, splitrule, min.node.size)
> names(mtrygrid) <- c("mtry", "splitrule", "min.node.size")
>
> c1 <- train(act_effort ~ ., data = tr,
>            method = "ranger",
>            tuneLength = 5,
>            metric = "MAE",
>            preProc = c("center", "scale", "nzv"),
>            tuneGrid = mtrygrid,
>            trControl = boot)
> c1
> #  Random Forest
> #
> #  30 samples
> #  23 predictors
> #
> #  Pre-processing: centered (48), scaled (48), remove (58)
> #  Resampling: Cross-Validated (10 fold)
> #  Summary of sample sizes: 28, 27, 27, 28, 27, 27, ...
> #  Resampling results across tuning parameters:
> #
> #    mtry  splitrule   min.node.size  RMSE      Rsquared   MAE
> #     2    variance     1             256.6391  0.8103759  186.3609
> #     2    variance     2             249.7120  0.8628109  183.6696
> #     2    variance     3             258.8240  0.8284449  189.0712
> #
> # [...omit...]
> #
> #    13    extratrees  10             254.9569  0.8918014  191.2524
> #    24    variance     1             177.7188  0.9458652  112.2800
> #    24    variance     2             172.6826  0.9204287  108.5943
> #    24    variance     3             172.9954  0.9271006  109.2554
> #    24    variance     4             172.2467  0.9523067  110.0776
> #    24    variance     5             175.2485  0.9283317  112.8798
> #    24    variance     6             177.9285  0.9369881  115.8970
> #    24    variance     7             180.5959  0.9485035  117.5816
> #    24    variance     8             178.8037  0.9358033  117.8725
> #    24    variance     9             176.5849  0.9210959  117.0055
> #    24    variance    10             178.6439  0.9257969  119.8035
> #    24    extratrees   1             219.1368  0.8801770  141.0720
> #    24    extratrees   2             216.1900  0.8550002  140.9263
> #    24    extratrees   3             212.4138  0.8979379  141.4282
> #    24    extratrees   4             218.2631  0.9121471  146.2908
> #    24    extratrees   5             212.5679  0.9279598  144.2715
> #    24    extratrees   6             218.9856  0.9141754  152.2099
> #    24    extratrees   7             222.8540  0.9412682  152.4614
> #    24    extratrees   8             228.1156  0.9423414  161.8456
> #    24    extratrees   9             226.6182  0.9408306  160.5264
> #    24    extratrees  10             226.9280  0.9429413  165.6878
> #
> #  MAE was used to select the optimal model using the smallest value.
> #  The final values used for the model were mtry = 24, splitrule = variance
> #   and min.node.size = 2.
> plot(c1)
>
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 23:03 de 30/06/2022, Neha gupta escreveu:
>
>> Ok, the data is pasted below
>>
>> But on the same data (everything the same) and with other models like RF,
>> SVM etc, it works fine.
>>
>>  > dput(head(tr, 30))
>> structure(list(recordnumber = c(0, 0.02, 0.04, 0.06, 0.07, 0.08,
>> 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.16, 0.17, 0.18, 0.23, 0.24,
>> 0.25, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.35, 0.36, 0.37, 0.38,
>> 0.4, 0.41), projectname = structure(c(1L, 1L, 1L, 1L, 2L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 5L, 6L), levels = c("de", "erb", "gal",
>> "X", "hst", "slp", "spl", "Y"), class = "factor"), cat2 = structure(c(3L,
>> 3L, 3L, 3L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
>> 9L, 11L, 5L, 4L, 6L, 8L, 3L, 9L, 9L, 9L, 9L, 6L, 7L), levels =
>> c("Avionics",
>> "application_ground", "avionicsmonitoring", "batchdataprocessing",
>> "communications", "datacapture", "launchprocessing", "missionplanning",
>> "monitor_control", "operatingsystem", "realdataprocessing", "science",
>> "simulation", "utility"), class = "factor"), forg = structure(c(2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("f",
>> "g"), class = "factor"), center = structure(c(2L, 2L, 2L, 2L,
>> 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L), levels = c("1", "2",
>> "3", "4", "5", "6"), class = "factor"), year = c(0.5, 0.5, 0.5,
>> 0.5, 0.6875, 0.5625, 0.5625, 0.8125, 0.5625, 0.875, 0.5625, 0.75,
>> 0.5625, 0.8125, 0.75, 0.9375, 0.9375, 0.9375, 0.6875, 0.6875,
>> 0.6875, 0.6875, 0.875, 1, 0.9375, 0.9375, 0.9375, 0.9375, 0.5625,
>> 0.25), mode = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L), levels = c("embedded", "organic", "semidetached"
>> ), class = "factor"), rely = structure(c(4L, 4L, 4L, 4L, 4L,
>> 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 3L,
>> 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L), levels = c("vl", "l", "n",
>> "h", "vh", "xh"), class = "factor"), data = structure(c(2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
>> 5L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), cplx = structure(c(4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L,
>> 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), time = structure(c(3L,
>> 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L,
>> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), stor = structure(c(3L,
>> 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L,
>> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), virt = structure(c(2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 3L,
>> 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), turn = structure(c(2L,
>> 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>> 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), acap = structure(c(3L,
>> 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L,
>> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), aexp = structure(c(3L,
>> 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), pcap = structure(c(3L,
>> 3L, 3L, 3L, 3L, 4L, 5L, 4L, 5L, 3L, 4L, 4L, 5L, 4L, 4L, 4L, 4L,
>> 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 4L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), vexp = structure(c(3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), lexp = structure(c(4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 4L, 4L, 4L, 4L, 3L, 3L,
>> 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), modp = structure(c(4L,
>> 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 4L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), tool = structure(c(3L,
>> 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 1L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), sced = structure(c(2L,
>> 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L), levels = c("vl",
>> "l", "n", "h", "vh", "xh"), class = "factor"), equivphyskloc = c(0.025534,
>> 0.006945, 0.008988, 0.002655, 0.067102, 0.006741, 0.019508, 0.005209,
>> 0.101215, 0.010622, 0.101215, 0.019508, 0.152283, 0.031253, 0.014401,
>> 0.014401, 0.037892, 0.009294, 0.015729, 0.012154, 0.032377, 0.035339,
>> 0.004698, 0.009703, 0.00572, 0.012358, 0.091002, 0.007252, 0.180778,
>> 0.307527), act_effort = c(117.6, 31.2, 25.2, 10.8, 352.8, 72,
>> 72, 24, 360, 36, 215, 48, 324, 60, 48, 90, 210, 48, 82, 62, 170,
>> 192, 18, 50, 42, 60, 444, 42, 1248, 2400)), row.names = c(1L,
>> 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L, 18L, 19L,
>> 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 36L, 37L, 38L, 39L,
>> 41L, 42L), class = "data.frame")
>>
>>
>>
>> On Thu, Jun 30, 2022 at 11:28 PM Rui Barradas <ruipbarradas using sapo.pt
>> <mailto:ruipbarradas using sapo.pt>> wrote:
>>
>>     Hello,
>>
>>     Please post data in dput format, without it it's difficult to tell.
>>     If I substitute
>>
>>     mpg for act_effort
>>     mtcars for tr
>>
>>     keeping everything else, I don't get any errors.
>>     And the error message says clearly that the error is in tr (data).
>>
>>     Can you post the output of dput(head(tr, 30))?
>>
>>     Rui Barradas
>>
>>
>>     Às 19:32 de 30/06/2022, Neha gupta escreveu:
>>      > I posted it for the second time as I didn't get any response from
>>     group
>>      > members. I am not sure if some problem is with the question.
>>      >
>>      >
>>      >
>>      > I cannot run the "ranger" model with caret. I am only using the
>>     farff and
>>      > caret libraries and the following code:
>>      >
>>      > boot <- trainControl(method = "cv", number=10)
>>      >
>>      > c1 <-train(act_effort ~ ., data = tr,
>>      >                method = "ranger",
>>      >                 tuneLength = 5,
>>      >                metric = "MAE",
>>      >                preProc = c("center", "scale", "nzv"),
>>      >                trControl = boot)
>>      >
>>      > The error I get is the repeating of the following message until I
>>     interrupt
>>      > it.
>>      >
>>      > Error: mtry can not be larger than number of variables in data.
>>     Ranger will
>>      > EXIT now.
>>      >
>>      >       [[alternative HTML version deleted]]
>>      >
>>      > ______________________________________________
>>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>>     -- To UNSUBSCRIBE and more, see
>>      > https://stat.ethz.ch/mailman/listinfo/r-help
>>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>>      > PLEASE do read the posting guide
>>     http://www.R-project.org/posting-guide.html
>>     <http://www.R-project.org/posting-guide.html>
>>      > and provide commented, minimal, self-contained, reproducible code.
>>
>>

	[[alternative HTML version deleted]]