[R] Fwd: varimp_in_party_package
Torsten Hothorn
Torsten.Hothorn at stat.uni-muenchen.de
Tue Jun 21 17:57:07 CEST 2011
On Thu, 16 Jun 2011, Jinrui Xu wrote:
> Thanks for your feedback.
> I think the problem is not because of many levels. There is only 1 column
> with two levels as class labels in my input data.
>
> Below is my code. The commandline "data.cforest.varimp <-
> varimp(data.cforest, conditional = TRUE)" reports "Error in
> model.matrix.default(as.formula(f),data = blocks): term 1 would require 4e+17
> columns"
>
> I also attached my input file. Hope you can run it for me to check what the
> problem is. Thanks a lot!
>
> PS: It takes 10 mins to finish the code below by 1 cpu and upto 2.5 GB
> memory. You can reduce the columns in the rawinput, which reduces computing
> intense and feeds back same error.
>
> library(randomForest)
> library(party)
>
> set.seed(71)
>
> rawinput <- read.table("featureSelection_rec.vectors")
> rawinput$V1 <- as.factor(as.numeric(rawinput$V1))
>
> data.controls <- cforest_unbiased(ntree=500, mtry=3)
> data.cforest <- cforest(V1~.,data=rawinput,controls=data.controls)
> data.cforest.varimp <- varimp(data.cforest, conditional = TRUE)
>
Hi Jinrui,
it turns out that for your data-set there are (using the default)
parameters 47 variables to condition on and thats way to much. You can
reduce the number of conditioning variables by increasing the `threshold'
parameter to something like .8
Best,
Torsten
>
>
>
>> there is a factor with (too) many levels in your data frame `rawinput'.
>>
>> summary(rawinput)
>>
>> will tell you which one.
>>
>> Torsten
>
>
>
> Quoting Torsten Hothorn <Torsten.Hothorn at stat.uni-muenchen.de>:
>
>>>
>>> Hello everyone,
>>>
>>> I use the following command lines to get important variable from training
>>> dataset.
>>>
>>>
>>> data.controls <- cforest_unbiased(ntree=500, mtry=3)
>>> data.cforest <- cforest(V1~.,data=rawinput,controls=data.controls)
>>> data.cforest.varimp <- varimp(data.cforest, conditional = TRUE)
>>>
>>> I got error: "Error in model.matrix.default(as.formula(f),data = blocks):
>>> term 1 would require 4e+17 columns"
>>>
>>>
>>> I changed data dimension to 150. The problem still exists. So, I guess
>>> there are other problems. Please give me some help or hints. Thanks!
>>>
>>> jinrui,
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>
>
More information about the R-help
mailing list