[R] User defined split function in rpart

Tobias Guennel tguennel at vcu.edu
Tue Feb 20 19:47:03 CET 2007


I have made some progress with the user defined splitting function and I got
a lot of the things I needed to work. However, I am still stuck on accessing
the node data. It would probably be enough if somebody could tell me, how I
can access the original data frame of the call to rpart. 
So if the call is: fit0 <- rpart(Sat ~Infl +Cont+ Type,
	     housing, control=rpart.control(minsplit=10, xval=0),
	     method=alist)
how can I access the housing data frame within the user defined splitting
function?

Any input would be highly appreciated!

Thank you
Tobias Guennel

-----Original Message-----
From: Tobias Guennel [mailto:tguennel at vcu.edu] 
Sent: Monday, February 19, 2007 3:40 PM
To: 'r-help at stat.math.ethz.ch'
Subject: [R] User defined split function in rpart

Maybe I should explain my Problem a little bit more detailed.
The rpart package allows for user defined split functions. An example is
given in the source/test directory of the package as usersplits.R.
The comments say that three functions have to be supplied:
1. "The 'evaluation' function.  Called once per node.
  Produce a label (1 or more elements long) for labeling each node,
  and a deviance." 
2. The split function, where most of the work occurs.
   Called once per split variable per node.
3. The init function:
   fix up y to deal with offsets
   return a dummy parms list
   numresp is the number of values produced by the eval routine's "label".

I have altered the evaluation function and the split function for my needs.
Within those functions, I need to fit a proportional odds model to the data
of the current node. I am using the polr() routine from the MASS package to
fit the model. 
Now my problem is, how can I call the polr() function only with the data of
the current node. That's what I tried so far:

evalfunc <- function(y,x,parms,data) {
       
pomnode<-polr(data$y~data$x,data,weights=data$Freq)
parprobs<-predict(pomnode,type="probs")
dev<-0
K<-dim(parprobs)[2]
N<-dim(parprobs)[1]/K
for(i in 1:N){
tempsum<-0
Ni<-0
for(l in 1:K){
Ni<-Ni+data$Freq[K*(i-1)+l]
}
for(j in 1:K){
tempsum<-tempsum+data$Freq[K*(i-1)+j]/Ni*log(parprobs[i,j]*Ni/data$Freq[K*(i
-1)+j])
}
dev=dev+Ni*tempsum
}
dev=-2*dev
wmean<-1
list(label= wmean, deviance=dev)

} 

I get the error: Error in eval(expr, envir, enclos) : argument "data" is
missing, with no default

How can I use the data of the current node?

Thank you
Tobias Guennel



More information about the R-help mailing list