[R] Count data in random Forest
Volker Bahn
lochapoka at web.de
Tue May 6 22:27:30 CEST 2008
Hi Birgit,
I'm not sure that I understand your question. I'll try to answer
anyways. Regression trees and therefore also RandomForests are invariant
to monotonic transformations in the independent variables. There are no
distributional assumptions for the independent variables. The dependent
variable, however, is used to calculate the variances within the two
groups of cases that result from a split. Therefore, it would make sense
to have the dependent variable follow the typical distributional
requirements of least-squares driven models such as homoscedasity,
symmetrical distribution etc. For count data a square root
transformation is often appropriate.
HTH
Volker
Birgit Lemcke wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Hello
> R-user!
>
> I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
> statistics beginner)
>
> I try to find the most important variables to divide my dataset as
> given in a categorical variable using randomForest.
>
> Is randomForest() able to deal with count data?
> Or is there no difference because only the ranks are used in the trees?
>
> Thanks in advance
>
> Birgit
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> birgit.lemcke at systbot.uzh.ch
>
> 175 Jahre UZH
> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> MNF-Jubiläumsevent für gross und klein.
> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
>
>
> </div>
More information about the R-help
mailing list