[R] Question about Cubist Model
Mxkuhn
mxkuhn at gmail.com
Fri Jan 13 00:04:29 CET 2017
> On Jan 12, 2017, at 5:37 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
>
> Dear All,
> I am fine tuning a Cubist model (see
> https://cran.r-project.org/web/packages/Cubist/index.html).
> I am a bit puzzled by its output. On a dataset which contains 275
> cases, I get non mutually exclusive rules.
> E.g., in the output below, rules 2 and 3 cover all the 275 cases of
> the data set and rule 1 overlaps partially.
> Am I misunderstanding something?
It is doing the right thing. The rules are first derived from a regression tree and, in the process of pruning the rules, they can produce overlapping sets. When the rules overlap, the regression output is average across the active rules.
Thanks,
Max
> Many thanks
>
> Lorenzo
>
>
>
>
> Cubist [Release 2.07 GPL Edition] Thu Jan 12 23:10:40 2017
> ---------------------------------
>
> Target attribute `outcome'
>
> Read 275 cases (21 attributes) from undefined.data
>
> Model:
>
> Rule 1: [204 cases, mean 0.5393324, range 0 to 2.285714, est err
> 0.2598495]
>
> if
> home_copub_after_all <= 0.7142857
> host_copub_after_all <= 1.833333
> then
> outcome = 0.1666667 + 0.9 home_copub_after_all
> + 0.11 home_copub_before_all
>
> Rule 2: [259 cases, mean 0.7445303, range 0 to 3.166667, est err
> 0.1866440]
>
> if
> host_copub_after_all <= 1.833333
> then
> outcome = 0.0433333 + 0.75 home_copub_after_all
> + 0.33 host_copub_after_all + 0.37
> top_10_after_all
>
> Rule 3: [16 cases, mean 4.4285712, range 2.142857 to 8.857142, est
> err 1.0346190]
>
> if
> host_copub_after_all > 1.833333
> then
> outcome = 1.595 + 1.03 top_10_after_all + 0.45
> home_copub_after_all
>
>
> Evaluation on training data (275 cases):
>
> Average |error| 0.2678023
> Relative |error| 0.38
> Correlation coefficient 0.94
>
>
> Attribute usage:
> Conds Model
>
> 100% 54% host_copub_after_all
> 43% 100%
> home_copub_after_all
> 57%
> top_10_after_all
> 43%
> home_copub_before_all
>
>
> Time: 0.0 secs
>>
More information about the R-help
mailing list