[R] Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns (fwd)

Torsten Hothorn Torsten.Hothorn at R-project.org
Mon Oct 17 16:53:32 CEST 2011


>
> I would like to build a forest of regression trees to see how well some
> covariates predict a response variable and to examine the importance of 
> the
> covariates. I have a small number of covariates (8) and large number of
> records (27368). The response and all of the covariates are continuous
> variables.
>
> A cursory examination of the covariates does not suggest they are 
> correlated
> in a simple fashion (e.g. the variance inflation factors are all fairly 
> low)
> but common sense suggests there should be some relationship: one of them 
> is
> the day of the year and some of the others are environmental parameters 
> such
> as water temperature. For this reason I would like to follow the advice 
> of
> Strobl et al. (2008) and try the authors' conditional variable 
> importance
> measure. This is implemented in the party package by calling varimp(...,
> conditional=TRUE). Unfortunately, when I call that on my forest I 
> receive
> the error:
>
>> varimp(myforest, conditional=TRUE)
> Error in model.matrix.default(as.formula(f), data = blocks) :
>  term 1 would require 9e+12 columns
>
> Does anyone know what is wrong?
>

Hi Jason,

the particular feature doesn't scale well in its current implementation. 
Anyway, thanks for looking up previous reports closely. I can offer to 
have a look at your data if you send them along with the code to reproduce 
the problem.

Best,

Torsten

> I noticed a post in June 2011 where a user reported this message and the
> ultimate problem was that the importance measure was being conditioned 
> on
> too many variables (47). I have only a small number of variables here so 
> I
> guessed that was not the problem.
>
> Another suggestion was that there could be a factor with too many 
> levels. In
> my case, all of the variables are continuous. Term 1 (x1 below) is the 
> day
> of the year, which does happen to be integers 1 ... 366. But the 
> variable is
> class numeric, not integer, so I don't believe cforest would treat it as 
> a
> factor, although I do not know how to tell whether cforest is treating
> something as continuous or as a factor.
>
> Thank you for any help you can provide. I am running R 2.13.1 with party
> 0.9-99994. You can download the data from
> http://www.duke.edu/~jjr8/data.rdata (512 KB). Here is the complete 
> code:
>
>> load("\\Temp\\data.rdata")
>> nrow(df)
> [1] 27368
>> summary(df)
>       y                 x1              x2               x3
> x4             x5                  x6              x7 
> x8
>
> Min.   :  0.000   Min.   :  1.0   Min.   :0.0000   Min.   :  1.00 
> Min.
> :  52   Min.   : 0.008184   Min.   :16.71   Min.   :0.0000000   Min.   :
> 0.02727
> 1st Qu.:  0.000   1st Qu.:105.0   1st Qu.:0.0000   1st Qu.: 30.00   1st
> Qu.:1290   1st Qu.: 6.747035   1st Qu.:23.92   1st Qu.:0.0000000   1st 
> Qu.:
> 0.11850
> Median :  1.282   Median :169.0   Median :0.2353   Median : 38.00 
> Median
> :1857   Median :11.310277   Median :26.35   Median :0.0001569   Median :
> 0.14625
> Mean   :  5.651   Mean   :178.7   Mean   :0.2555   Mean   : 55.03 
> Mean
> :1907   Mean   :12.889021   Mean   :26.31   Mean   :0.0162043   Mean   :
> 0.20684
> 3rd Qu.:  5.353   3rd Qu.:262.0   3rd Qu.:0.4315   3rd Qu.: 47.00   3rd
> Qu.:2594   3rd Qu.:18.427410   3rd Qu.:28.95   3rd Qu.:0.0144660   3rd 
> Qu.:
> 0.20095
> Max.   :195.238   Max.   :366.0   Max.   :1.0000   Max.   :400.00 
> Max.
> :3832   Max.   :29.492380   Max.   :31.73   Max.   :0.3157486   Max.
> :11.76877
>> library(HH)
> <output deleted>
>> vif(y ~ ., data=df)
>      x1       x2       x3       x4       x5       x6       x7       x8
> 1.374583 1.252250 1.021672 1.218801 1.015124 1.439868 1.075546 1.060580
>> library(party)
> <output deleted>
>> mycontrols <- cforest_unbiased(ntree=50, mtry=3)           # Small 
>> forest
> but requires a few minutes
>> myforest <- cforest(y ~ ., data=df, controls=mycontrols)
>> varimp(myforest)
>        x1         x2         x3         x4         x5         x6 
> x7
> x8
> 11.924498 103.180195  16.228864  30.658946   5.053500  12.820551 
> 2.113394
> 6.911377
>> varimp(myforest, conditional=TRUE)
> Error in model.matrix.default(as.formula(f), data = blocks) :
>  term 1 would require 9e+12 columns
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list