[R] mob(party) formula question
Achim Zeileis
Achim.Zeileis at wu-wien.ac.at
Wed Aug 13 14:08:47 CEST 2008
On Wed, 13 Aug 2008, Birgitle wrote:
> I try tu use mob() with my data.frame ('data.frame': 288 obs. of 81
> variables; factors, numerics and ordered factors)
> My response is a binary variable and I should use for modelling a logistic
> regression (family=binomial).
>
> I read in the "MOB" Vignette that I could use a formula like this if I would
> like to have only partitioning variables apart from the response.
>
> Test.mob<-mob(Resp~1|Var1+Var2+...., data=dataframe, model=glinearModel,
> family=binomial())
This works for me. Considering an example that is easily reproducible:
classifying just two (out of three) species in the iris data.
iris2 <- iris[-(1:50),]
iris2$Species <- factor(iris2$Species)
mb <- mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length +
Sepal.Width, data = iris2, model = glinearModel, family = binomial())
and this runs fine, just selecting a single split
R> mb
1) Petal.Width <= 1.7; criterion = 1, statistic = 81.818
2)* weights = 54
Terminal node model
Binomial GLM with coefficients:
(Intercept)
-2.282
1) Petal.Width > 1.7
3)* weights = 46
Terminal node model
Binomial GLM with coefficients:
(Intercept)
3.807
> but this gives me back an error-message:
>
> Error in `[.data.frame`(x, r, vars, drop = drop) :
> undefined columns selected
>
> But Var1, Var2 and Resp are in my dataframe. Why do I get this error?
More importantly, when do you get this error? My guess is that this is
during plotting, right?
If so, then the problem is that the plot() method for "mob" object by
default calls node_bivplot() in each terminal node which is designed for
generating partial regressor plots. In this situation this does not make
sense because you don't have regressors in the terminal nodes.
We haven't got a panel function for the type of model you are looking at
but I've just hacked a simple one that should be sufficient for your
purposes. It is essentially like node_barplot() but exploits the binomial
model. It is attached below. With this you can do
plot(mb, terminal_panel = myplot, tnex = 2)
> I am also wondering how I can find out which variables I should use for
> partitioning and which for modelling?
For the variables for which a linear specification makes sense (at least
in each component) then you should include them for modeling. And those
variables for which it is not clear a priori what a useful parametric
specification would be should be used as partitioning variables.
> There are correlations between some variables in my dataframe. Would it be a
> possibility to use always one variable of the correlated variable-pairs for
> partitioning and one for modelling?
You can do that, but you could also do other combinations. That probably
depends on your application.
hth,
Z
myplot <- function(ctreeobj,
col = "black",
fill = NULL,
beside = NULL,
ymax = NULL,
ylines = NULL,
widths = 1,
gap = NULL,
reverse = NULL,
id = TRUE)
{
getMaxPred <- function(x) {
mp <- max(x$prediction)
mpl <- ifelse(x$terminal, 0, getMaxPred(x$left))
mpr <- ifelse(x$terminal, 0, getMaxPred(x$right))
return(max(c(mp, mpl, mpr)))
}
y <- response(ctreeobj)[[1]]
if(is.factor(y) || class(y) == "was_ordered") {
ylevels <- levels(y)
if(is.null(beside)) beside <- if(length(ylevels) < 3) FALSE else TRUE
if(is.null(ymax)) ymax <- if(beside) 1.1 else 1
if(is.null(gap)) gap <- if(beside) 0.1 else 0
} else {
if(is.null(beside)) beside <- FALSE
if(is.null(ymax)) ymax <- getMaxPred(ctreeobj at tree) * 1.1
ylevels <- seq(along = ctreeobj at tree$prediction)
if(length(ylevels) < 2) ylevels <- ""
if(is.null(gap)) gap <- 1
}
if(is.null(reverse)) reverse <- !beside
if(is.null(fill)) fill <- gray.colors(length(ylevels))
if(is.null(ylines)) ylines <- if(beside) c(3, 2) else c(1.5, 2.5)
### panel function for barplots in nodes
rval <- function(node) {
## parameter setup
fm <- node$model
pred <- fm$family$linkinv(coef(fm))
if(reverse) {
pred <- rev(pred)
ylevels <- rev(ylevels)
}
np <- length(pred)
nc <- if(beside) np else 1
fill <- rep(fill, length.out = np)
widths <- rep(widths, length.out = nc)
col <- rep(col, length.out = nc)
ylines <- rep(ylines, length.out = 2)
gap <- gap * sum(widths)
yscale <- c(0, ymax)
xscale <- c(0, sum(widths) + (nc+1)*gap)
top_vp <- viewport(layout = grid.layout(nrow = 2, ncol = 3,
widths = unit(c(ylines[1], 1, ylines[2]), c("lines", "null", "lines")),
heights = unit(c(1, 1), c("lines", "null"))),
width = unit(1, "npc"),
height = unit(1, "npc") - unit(2, "lines"),
name = paste("node_barplot", node$nodeID, sep = ""))
pushViewport(top_vp)
grid.rect(gp = gpar(fill = "white", col = 0))
## main title
top <- viewport(layout.pos.col=2, layout.pos.row=1)
pushViewport(top)
mainlab <- paste(ifelse(id, paste("Node", node$nodeID, "(n = "), "n = "),
sum(node$weights), ifelse(id, ")", ""), sep = "")
grid.text(mainlab)
popViewport()
plot <- viewport(layout.pos.col=2, layout.pos.row=2,
xscale=xscale, yscale=yscale,
name = paste("node_barplot", node$nodeID, "plot",
sep = ""))
pushViewport(plot)
if(beside) {
xcenter <- cumsum(widths+gap) - widths/2
for (i in 1:np) {
grid.rect(x = xcenter[i], y = 0, height = pred[i],
width = widths[i],
just = c("center", "bottom"), default.units = "native",
gp = gpar(col = col[i], fill = fill[i]))
}
if(length(xcenter) > 1) grid.xaxis(at = xcenter, label = FALSE)
grid.text(ylevels, x = xcenter, y = unit(-1, "lines"),
just = c("center", "top"),
default.units = "native", check.overlap = TRUE)
grid.yaxis()
} else {
ycenter <- cumsum(pred) - pred
for (i in 1:np) {
grid.rect(x = xscale[2]/2, y = ycenter[i], height = min(pred[i], ymax - ycenter[i]),
width = widths[1],
just = c("center", "bottom"), default.units = "native",
gp = gpar(col = col[i], fill = fill[i]))
}
if(np > 1) {
grid.text(ylevels[1], x = unit(-1, "lines"), y = 0,
just = c("left", "center"), rot = 90,
default.units = "native", check.overlap = TRUE)
grid.text(ylevels[np], x = unit(-1, "lines"), y = ymax,
just = c("right", "center"), rot = 90,
default.units = "native", check.overlap = TRUE)
}
if(np > 2) {
grid.text(ylevels[-c(1,np)], x = unit(-1, "lines"), y = ycenter[-c(1,np)],
just = "center", rot = 90,
default.units = "native", check.overlap = TRUE)
}
grid.yaxis(main = FALSE)
}
grid.rect(gp = gpar(fill = "transparent"))
upViewport(2)
}
return(rval)
}
class(myplot) <- "grapcon_generator"
> I would be very happy if somebody could give me some hints or answers to my
> questions.
>
> Many thanks in advance.
>
> B.
>
>
>
> -----
> The art of living is more like wrestling than dancing.
> (Marcus Aurelius)
> --
> View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18959898.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list