[R] logistic regression tree

Gavin Simpson gavin.simpson at ucl.ac.uk
Sat Aug 21 00:41:45 CEST 2010


On Fri, 2010-08-20 at 14:46 -0700, Kay Cichini wrote:
> hello,
> 
> my data-collection is not yet finished, but i though have started
> investigating possible analysis methods.
> 
> below i give a very close simulation of my future data-set, however there
> might be more nominal explanatory variables - there will be no continous at
> all  (maybe some ordered nominal..).  
> 
> i tried several packages today, but the one i fancied most was ctree of the
> party package.
> i can't see why the given no. of datapoints (n=100) might pose a problem
> here - but please teach me better, as i might be naive..

I'm no expert, but single trees are unstable predictors; change your
data slightly and you might get a totally different model/tree. I hope
that worries you?

Frank's comment was that depending upon the signal-to-noise ratio in
your sample of data, you might need a very large data set indeed, much
larger than your 100 data points/samples, to have any confidence in the
single fitted tree.

For this reason, ensemble or committee methods have been developed that
combine the predictions from many trees fitted to perturbed versions of
the training data. Such methods include boosting and randomForests.

We are venturing into territory not suited to email list format;
statistical consultancy. As Achim is local to you and has kindly offered
to meet you, I would strongly suggest you take up his offer.

In the meantime, here are a couple of references to look at if you
aren't familiar with these statistical machine learning techniques.

Cutler et al (2007) Random forests for classification in ecology.
Ecology 88(11), 2783---2792.

Elith, J., Leathwick, J.R., and Hastie, T. (2008) A working guide to
boosted regression trees. Journal of Animal Ecology, 77, 802---813.

Also, don't dismiss the logistic regression model. Modern techniques
like the lasso and elastic net are available for GLMs such as this and
include model selection as part of their fitting. These are underused by
ecologists (IMHO) who seem to like (abuse?)the information theoretic
approaches and step-wise selection procedures... (apologies to
ecologists here [I am one too] for being general!) See:

Dahlgren J.p. (2010) Alternative regression methods are not considered
in Murtaugh (2009) or by ecologists in general. Ecology Letters 13(5)
E7-E9.

HTH

G

> i'd be very glad about comments on the use of ctree on suchalike dataset and
> if i oversee possible pitfalls....
> 
> thank you all,
> kay
> 
> ######################################################################################
> # an example with 3 nominal explanatory variables:
> # Y is presence of a certain invasive plant species
> # introduced effect for fac1 and fac3, fac2 without effect.
> # presence with prob. 0.75 in factor combination fac1=I (say fac1 is geogr.
> region) and  
> # fac3 = a|b|c (say all richer substrates). 
> # presence is not influenced by fac2, which might be vegetation type, i.e.
> ######################################################################################
> library(party)
> dat<-cbind(
> expand.grid(fac1=c("I","II"),
>             fac2=LETTERS[1:5],
>             fac3=letters[1:10]))
> 
> print(dat<-dat[order(dat$fac1,dat$fac2,dat$fac3),])
> 
> dat$fac13<-paste(dat$fac1,dat$fac3,sep="")
> for(i in 1:nrow(dat)){
> ifelse(dat$fac13[i]=="Ia"|dat$fac13[i]=="Ib"|dat$fac13[i]=="Ic",
>        dat$Y[i]<-rbinom(1,1,0.75),
>        dat$Y[i]<-rbinom(1,1,0))
> }
> dat$Y<-as.factor(dat$Y)
> 
> tr<-ctree(Y~fac1+fac2+fac3,data=dat)
> plot(tr)
> ######################################################################################
> 
> 
> -----
> ------------------------
> Kay Cichini
> Postgraduate student
> Institute of Botany
> Univ. of Innsbruck
> ------------------------
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list