[R] random forest -optimising mtry

Liaw, Andy andy_liaw at merck.com
Wed Oct 13 12:10:39 CEST 2004


What I would try is repeat the RF with both mtry a couple of times and see
how variable those inter-class proximities are.  Even though the differences
seem large, they could be within the noise.  If so, your only option would
be to increase the number of trees.

If you are getting 0 OOB error, (and the inter-class proximities are so low)
the classes are probably rather well-separated.  I wouldn't worry too much
about pinning down an `optimal' mtry.  The OOB error curve is usually fairly
flat over a wide range of mtry, so you're likely to get different optima on
repeated runs, with similar OOB errors.

HTH,
Andy

> From: Ute
> 
> Dear R-helpers,
> 
> I'm working on mass spectra in randomForest/R, and following the 
> recommendations for the case of noisy variables, I don't want 
> to use the 
> default mtry (sqrt of nvariables), but I'm not sure up to which 
> proportion mtry/nvariables it makes sense to increase mtry without 
> "overtuning" RF.
> Let me tell my example: I have 106 spectra belonging to 4 
> classes, the 
> number of variables is 172. I'm interested in finding 
> information about 
> variables (importance, split points etc.) and proximities.
> First I  ran a forest with mtry =30 and ntree=2500. The result was an 
> oob-estimate of overall error rate of zero, perfect 
> classification.  In 
> order to explore my results, I calculated the average 
> proximity between 
> the classes. I got:
>  > res
>            op12          op13           op14           op23          
> op24          op34
> [1,] 0.06145473 0.1369406 0.08036264 0.06171053 0.1113126 0.06732087
> For me, the important meaning of these values is that from 
> comparision 
> of class 1 and 3, as well as class 2 and 4 result more common 
> features 
> than from other comparisions. I have worked yet a lot about 
> these data, 
> I have looked a lot on my spectra, and I believe these 
> proximities to be 
> realistic.
> 
> Then I ran the tune RF function(step factor 1.5), I got out 
> an mtry=63. 
> A new forest having this mtry and 2500 trees gave me perfect 
> classification as well, but the relation between proximitiy values 
> changed a lot:
> res
>           op12         op13       op14           op23               
> op24       op34
> [1,] 0.1092702 0.117489 0.09696328 0.08725208 0.08495621 0.06506148
> 
> This is what makes me think that I have overtuned my second 
> forest...So 
> how should I choose mtry?
> 
> Best regards,
> Ute
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list