[R] Question about rpart(sth~.,database)
Gavin Simpson
gavin.simpson at ucl.ac.uk
Sun Apr 19 13:13:50 CEST 2009
Grześ wrote:
> I have a standard database - HouseVotes84
> For example:
> Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
> 1 republican n y n y y y n n n y <NA> y y y n y
> 2 republican n y n y y y n n n n n y y y n <NA>
> 3 democrat <NA> y y <NA> y y n n n n y n y y n n
> .
> .
> .
> end I build a tree like this:
>> hv.tree1=rpart(Class~.,HouseVotes84)
> everything is ok! My question is:
> What exactly mean "Class~.,"?
It means include all remaining variables in HouseVotes84 on the rhs of
the formula, i.e. as variables that should be used to predict the Class
variable.
>
> Why when I use "Class~.," - then I get the best solution but when I use as a
> parameter like this:
>> hv.tree2=rpart(V2~.,HouseVotes84)
Why does this surprise you? You are now trying to predict the variable
V2 (y/n) from Class and all remaining variables.
> I also get solution but not such good like before.
They are solutions to two different problems.
If you want to predict Class, then you need
Class ~ ., data = HouseVotes84
or, to specify exactly which variables to use as predictors of Class,
state them explicitly:
Class ~ V1 + V3 + V4, data = HouseVotes84
I think you should look at the documentation that comes with R (An
Introduction to R) or some of the contributed help documents on the R
Website to read up on model formulae and how to represent models using
this notation.
HTH
G
More information about the R-help
mailing list