[R] mob(party) formula question

Wed Aug 13 14:08:47 CEST 2008

On Wed, 13 Aug 2008, Birgitle wrote:

> I try tu use mob() with my data.frame ('data.frame':	288 obs. of  81
> variables; factors, numerics and ordered factors)
> My response is a binary variable and I should use for modelling a logistic
> regression (family=binomial).
>
> I read in the "MOB" Vignette that I could use a formula like this if I would
> like to have only partitioning variables apart from the response.
>
> Test.mob<-mob(Resp~1|Var1+Var2+...., data=dataframe, model=glinearModel,
> family=binomial())

This works for me. Considering an example that is easily reproducible: 
classifying just two (out of three) species in the iris data.

iris2 <- iris[-(1:50),]
iris2$Species <- factor(iris2$Species)
mb <- mob(Species ~ 1 | Petal.Length + Petal.Width + Sepal.Length +
    Sepal.Width, data = iris2, model = glinearModel, family = binomial())

and this runs fine, just selecting a single split

R> mb
1) Petal.Width <= 1.7; criterion = 1, statistic = 81.818
    2)*  weights = 54
Terminal node model
Binomial GLM with coefficients:
(Intercept)
       -2.282

1) Petal.Width > 1.7
    3)*  weights = 46
Terminal node model
Binomial GLM with coefficients:
(Intercept)
        3.807

> but this gives me back an error-message:
>
> Error in `[.data.frame`(x, r, vars, drop = drop) :
>  undefined columns selected
>
> But Var1, Var2 and Resp are in my dataframe. Why do I get this error?

More importantly, when do you get this error? My guess is that this is 
during plotting, right?

If so, then the problem is that the plot() method for "mob" object by 
default calls node_bivplot() in each terminal node which is designed for 
generating partial regressor plots. In this situation this does not make 
sense because you don't have regressors in the terminal nodes.

We haven't got a panel function for the type of model you are looking at 
but I've just hacked a simple one that should be sufficient for your 
purposes. It is essentially like node_barplot() but exploits the binomial 
model. It is attached below. With this you can do
    plot(mb, terminal_panel = myplot, tnex = 2)

> I am also wondering how I can find out which variables I should use for
> partitioning and which for modelling?

For the variables for which a linear specification makes sense (at least 
in each component) then you should include them for modeling. And those 
variables for which it is not clear a priori what a useful parametric 
specification would be should be used as partitioning variables.

> There are correlations between some variables in my dataframe. Would it be a
> possibility to use always one variable of the correlated variable-pairs for
> partitioning and one for modelling?

You can do that, but you could also do other combinations. That probably 
depends on your application.

hth,
Z

myplot <- function(ctreeobj,
                           col = "black",
        		         fill = NULL,
  			 beside = NULL,
  		         ymax = NULL,
  		         ylines = NULL,
  		         widths = 1,
  		         gap = NULL,
  			 reverse = NULL,
  		         id = TRUE)
{
      getMaxPred <- function(x) {
        mp <- max(x$prediction)
        mpl <- ifelse(x$terminal, 0, getMaxPred(x$left))
        mpr <- ifelse(x$terminal, 0, getMaxPred(x$right))
        return(max(c(mp, mpl, mpr)))
      }

      y <- response(ctreeobj)[[1]]

      if(is.factor(y) || class(y) == "was_ordered") {
          ylevels <- levels(y)
  	if(is.null(beside)) beside <- if(length(ylevels) < 3) FALSE else TRUE
          if(is.null(ymax)) ymax <- if(beside) 1.1 else 1
  	if(is.null(gap)) gap <- if(beside) 0.1 else 0
      } else {
          if(is.null(beside)) beside <- FALSE
          if(is.null(ymax)) ymax <- getMaxPred(ctreeobj at tree) * 1.1
          ylevels <- seq(along = ctreeobj at tree$prediction)
          if(length(ylevels) < 2) ylevels <- ""
  	if(is.null(gap)) gap <- 1
      }
      if(is.null(reverse)) reverse <- !beside
      if(is.null(fill)) fill <- gray.colors(length(ylevels))
      if(is.null(ylines)) ylines <- if(beside) c(3, 2) else c(1.5, 2.5)

      ### panel function for barplots in nodes
      rval <- function(node) {

          ## parameter setup
  	fm <- node$model
          pred <- fm$family$linkinv(coef(fm))
  	if(reverse) {
  	  pred <- rev(pred)
  	  ylevels <- rev(ylevels)
  	}
          np <- length(pred)
  	nc <- if(beside) np else 1

  	fill <- rep(fill, length.out = np)
          widths <- rep(widths, length.out = nc)
  	col <- rep(col, length.out = nc)
  	ylines <- rep(ylines, length.out = 2)

  	gap <- gap * sum(widths)
          yscale <- c(0, ymax)
          xscale <- c(0, sum(widths) + (nc+1)*gap)

          top_vp <- viewport(layout = grid.layout(nrow = 2, ncol = 3,
                             widths = unit(c(ylines[1], 1, ylines[2]), c("lines", "null", "lines")),
                             heights = unit(c(1, 1), c("lines", "null"))),
                             width = unit(1, "npc"),
                             height = unit(1, "npc") - unit(2, "lines"),
  			   name = paste("node_barplot", node$nodeID, sep = ""))

          pushViewport(top_vp)
          grid.rect(gp = gpar(fill = "white", col = 0))

          ## main title
          top <- viewport(layout.pos.col=2, layout.pos.row=1)
          pushViewport(top)
  	mainlab <- paste(ifelse(id, paste("Node", node$nodeID, "(n = "), "n = "),
  	                 sum(node$weights), ifelse(id, ")", ""), sep = "")
          grid.text(mainlab)
          popViewport()

          plot <- viewport(layout.pos.col=2, layout.pos.row=2,
                           xscale=xscale, yscale=yscale,
  			 name = paste("node_barplot", node$nodeID, "plot",
                           sep = ""))

          pushViewport(plot)

  	if(beside) {
    	  xcenter <- cumsum(widths+gap) - widths/2
  	  for (i in 1:np) {
              grid.rect(x = xcenter[i], y = 0, height = pred[i],
                        width = widths[i],
  	              just = c("center", "bottom"), default.units = "native",
  	              gp = gpar(col = col[i], fill = fill[i]))
  	  }
            if(length(xcenter) > 1) grid.xaxis(at = xcenter, label = FALSE)
  	  grid.text(ylevels, x = xcenter, y = unit(-1, "lines"),
                      just = c("center", "top"),
  	            default.units = "native", check.overlap = TRUE)
            grid.yaxis()
  	} else {
    	  ycenter <- cumsum(pred) - pred

  	  for (i in 1:np) {
              grid.rect(x = xscale[2]/2, y = ycenter[i], height = min(pred[i], ymax - ycenter[i]),
                        width = widths[1],
  	              just = c("center", "bottom"), default.units = "native",
  	              gp = gpar(col = col[i], fill = fill[i]))
  	  }
            if(np > 1) {
  	    grid.text(ylevels[1], x = unit(-1, "lines"), y = 0,
                        just = c("left", "center"), rot = 90,
  	              default.units = "native", check.overlap = TRUE)
  	    grid.text(ylevels[np], x = unit(-1, "lines"), y = ymax,
                        just = c("right", "center"), rot = 90,
  	              default.units = "native", check.overlap = TRUE)
  	  }
            if(np > 2) {
  	    grid.text(ylevels[-c(1,np)], x = unit(-1, "lines"), y = ycenter[-c(1,np)],
                        just = "center", rot = 90,
  	              default.units = "native", check.overlap = TRUE)
  	  }
            grid.yaxis(main = FALSE)
  	}

          grid.rect(gp = gpar(fill = "transparent"))
          upViewport(2)
      }

      return(rval)
}
class(myplot) <- "grapcon_generator"

> I would be very happy if somebody could give me some hints or answers to my
> questions.
>
> Many thanks in advance.
>
> B.
>
>
>
> -----
> The art of living is more like wrestling than dancing.
> (Marcus Aurelius)
> -- 
> View this message in context: http://www.nabble.com/mob%28party%29-formula-question-tp18959898p18959898.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>