[R-sig-eco] ctree regression tree, interpretation of mean predicted values in terminal nodes
Peter Solymos
solymos at ualberta.ca
Tue Mar 27 23:46:06 CEST 2012
Kay,
That is an obvious result of the regression tree algorithm which
recursively splits the data and prediction is given as e.g. mean of
observations at terminal nodes. New data will, however, contribute to
cross validation error, a measure of prediction accuracy. The tree
gives the 'global' model, and within each terminal node a 'local'
model can be fit. The simplest 'local' model is the piecewise-constant
model that is the mean. It's like looking for finest homogeneous
stratum where the response is constant. There might be differences in
how a split is chosen, how the tree is pruned, when to stop (rpart,
ctree, etc), but same terminal groping leads to same fitted values.
Cheers,
Peter
Péter Sólymos
Alberta Biodiversity Monitoring Institute
and Boreal Avian Modelling project
Department of Biological Sciences
CW 405, Biological Sciences Bldg
University of Alberta
Edmonton, Alberta, T6G 2E9, Canada
Phone: 780.492.8534
Fax: 780.492.7635
email <- paste("solymos", "ualberta.ca", sep = "@")
http://www.abmi.ca
http://www.borealbirds.ca
http://sites.google.com/site/psolymos
On Tue, Mar 27, 2012 at 2:42 PM, Kay Cichini <kay.cichini at gmail.com> wrote:
> I can't grasp how it can be that the mean prediction at terminal nodes
> perfectly fit the true mean values of the observed variable at the terminal
> nodes -
> I'm afraid I'm missing something completely obviuos here:
>
> # make a regression tree:
> rt <- ctree(Ozone ~ ., data = airq)
>
> # Validate:
> Prediction <- unlist(treeresponse(rt))
> (Val <- data.frame(Node = rt at where,
> Prediction, True = airq$Ozone))
>
> # compare mean prediction per node
> # with observed mean values per node:
> options(scipen = 999)
> cbind(aggregate(True ~ Node, FUN = mean, data = Val),
> Pred = aggregate(Prediction ~ Node, FUN = mean, data = Val)[, 2])
>
> # also, plot predictions vs. true values:
> plot(Val$Prediction, Val$True)
> coef <- coef(lm(Val$Prediction ~ Val$True))
> abline(c(0, coef[1]), c(1, coef[2]))
> myseq <- seq(0, 75, 25)
> abline(v = myseq, h = myseq)
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
More information about the R-sig-ecology
mailing list