[R] Question about Cubist Model

Mxkuhn mxkuhn at gmail.com
Fri Jan 13 00:04:29 CET 2017



> On Jan 12, 2017, at 5:37 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
> 
> Dear All,
> I am fine tuning a Cubist model (see
> https://cran.r-project.org/web/packages/Cubist/index.html).
> I am a bit puzzled by its output. On a dataset which contains 275
> cases, I get non mutually exclusive rules.
> E.g., in the output below, rules 2 and 3 cover all the 275 cases of
> the data set and rule 1 overlaps partially.
> Am I misunderstanding something?

It is doing the right thing. The rules are first derived from a regression tree and, in the process of pruning the rules, they can produce overlapping sets. When the rules overlap, the regression output is average across the active rules. 

Thanks,

Max

> Many thanks
> 
> Lorenzo
> 
> 
> 
> 
> Cubist [Release 2.07 GPL Edition]  Thu Jan 12 23:10:40 2017
> ---------------------------------
> 
>   Target attribute `outcome'
> 
> Read 275 cases (21 attributes) from undefined.data
> 
> Model:
> 
> Rule 1: [204 cases, mean 0.5393324, range 0 to 2.285714, est err
> 0.2598495]
> 
>   if
>    home_copub_after_all <= 0.7142857
>       host_copub_after_all <= 1.833333
>         then
>         outcome = 0.1666667 + 0.9 home_copub_after_all
>                  + 0.11 home_copub_before_all
> 
> Rule 2: [259 cases, mean 0.7445303, range 0 to 3.166667, est err
> 0.1866440]
> 
>   if
>    host_copub_after_all <= 1.833333
>        then
>        outcome = 0.0433333 + 0.75 home_copub_after_all
>                      + 0.33 host_copub_after_all + 0.37
>    top_10_after_all
> 
> Rule 3: [16 cases, mean 4.4285712, range 2.142857 to 8.857142, est
> err 1.0346190]
> 
>   if
>    host_copub_after_all > 1.833333
>        then
>        outcome = 1.595 + 1.03 top_10_after_all + 0.45
>    home_copub_after_all
> 
> 
> Evaluation on training data (275 cases):
> 
>   Average  |error|          0.2678023
>       Relative |error|               0.38
>        Correlation coefficient        0.94
> 
> 
>        Attribute usage:
>                    Conds  Model
> 
>              100%    54%    host_copub_after_all
>                       43%   100%
>                       home_copub_after_all
>                              57%
>                       top_10_after_all
>                              43%
>                       home_copub_before_all
> 
> 
> Time: 0.0 secs
>> 



More information about the R-help mailing list