[R] MOB (party package) Question - Variable Selection
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Wed Aug 7 19:37:53 CEST 2013
Michael:
> Hi. I am a grad student and I'm currently using the MOB function in the
> R party package and I had a question. I am working on an environmental
> problem with about 100 predictors. I am having trouble determining which
> predictors to use for regression and which for partitioning, is there
> any sort of method to determine this?
That depends a little bit on what exactly you are trying to achieve. When
we developed MOB, we had the following situation in mind:
- You have some sort of data for which you know from the literature that a
certain type of model works well. For example, log(y) ~ log(x1) + log(x2)
or something like that.
- But you also have data on a bunch of other variables that you don't know
yet how they should enter the model. Often these are categorical variables
or numerical variables that are not part of the standard theory.
- Then MOB is one possible approach to check whether these additional
variables affect the basic standard model or not. And by recursive
partitioning you could capture various types of main and interaction
effects.
However, if you just have a response variable and a bunch of regressors
where you don't have much prior knowledge. And you want to select both the
relevant variables and their functional form, then MOB might help you but
there might also be other methods that are more natural. For example, GAMs
or boosting etc.
> Does it cause problems if a variable is used for both regression and
> partitioning?
In principle, this is possible. Whether or not this is meaningful and/or
easy to interpret depends on the particular data though.
> I attempted to pre-screen the variables using stepwise linear regression
> and I used the selected variables for regression and all others for
> partitioning. However this lead to the model only having one node.
That's not very surprising, is it? You already tried to capture the
potential influence of all regressors on your response. Of course, MOB
might have turned up a few additional interactions but I'm not surprised
if it doesn't.
We've obtained the most useful results when the basic model had relatively
few parameters and was easy/natural to interpret.
Hope that helps,
Z
More information about the R-help
mailing list