[R] Count data in random Forest

Volker Bahn lochapoka at web.de
Tue May 6 22:27:30 CEST 2008


Hi Birgit,

I'm not sure that I understand your question. I'll try to answer 
anyways. Regression trees and therefore also RandomForests are invariant 
to monotonic transformations in the independent variables. There are no 
distributional assumptions for the independent variables. The dependent 
variable, however, is used to calculate the variances within the two 
groups of cases that result from a split. Therefore, it would make sense 
to have the dependent variable follow the typical distributional 
requirements of least-squares driven models such as homoscedasity, 
symmetrical distribution etc. For count data a square root 
transformation is often appropriate.

HTH

Volker

Birgit Lemcke wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hello 
> R-user!
>
> I am running R 2.7.0 on a Power Book (Tiger). (I am still R and 
> statistics beginner)
>
> I try to find the most important variables to divide my dataset as 
> given in a categorical variable using randomForest.
>
> Is randomForest() able to deal with count data?
> Or is there no difference because only the ranks are used in the trees?
>
> Thanks in advance
>
> Birgit
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> birgit.lemcke at systbot.uzh.ch
>
> 175 Jahre UZH
> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> MNF-Jubiläumsevent für gross und klein.
> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
>
>
> </div>



More information about the R-help mailing list