[R] ppr, number of terms, and data ordering
david.beede@mail.doc.gov
david.beede at mail.doc.gov
Wed Jun 6 18:11:48 CEST 2001
Dear R listers --
I have several questions about using the ppr command in the modreg module.
I discovered -- quite by accident -- that if I re-order the data, I obtain different
results. The output below shows what I mean. I have two datasets (dataset1 and dataset2)
that are identical (tested using proc compare in SAS) except for the fact that the records
are in different order. Below I have pasted in the results from running ppr on the two
data sets in their original order (first and third sets of results below) and running
ppr after sorting the datasets by idnum (second and third sets of results below).
At first I thought that perhaps the regression parameters are different but the underlying
results are equivalent but predicted values are significantly different. I tried
increasing the bass parameter, thinking that perhaps I was overfitting the data,
but the differences in the regression parameters remained. Finally, I originally had lots
of other RHS variables, including indicator variables; stripping those variables out
did not change my findings, as shown below.
My first question is: is there a recommended way to sort the data before running ppr?
In the meantime, I'll try sorting by my two continuous RHS variables to see if it makes
a difference -- not a definitive answer but it may be suggestive.
My second question is: is my method for deciding on the number of terms in the regression
okay? What I am doing is first running ppr with a large maximum number of terms, then
finding the number of terms that minimizes the goodness-of-fit statistic. Looking at the
cpus example in the section of MASS that deals with ppr (pp. 293-294), it is unclear why
eight terms were finally chosen, when using ten terms yields a lower goodness-of-fit
statistic.
Finally, in the same example in MASS, where does the test.cpus() function come from? I
couldn't find it in the MASS table of contents on CRAN.
Thanks in advance for any help you can give!
######DATASET1, ORIGINAL ORDER
> pprfile.ppr <- ppr(
+ award~
+ ilogemp+ilogage,
+ data=dataset1, nterms=1, max.terms=40, optlevel=3, bass=0
+ )
> pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset1,
nterms = 1, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
1 terms 2 terms 3 terms 4 terms 5 terms 6 terms
94153840890 97278012020 86500905681 55028690369 53258726650 48006301832
7 terms 8 terms 9 terms 10 terms 11 terms 12 terms
64844079166 44746090412 4502608255 2047094369 4361229925 2811887563
13 terms 14 terms 15 terms 16 terms 17 terms 18 terms
4960791975 2717103497 2265582134 2288868136 2470815605 4989632044
19 terms 20 terms 21 terms 22 terms 23 terms 24 terms
4966101666 3993722223 4000594447 3925383715 7636913238 7714228211
25 terms 26 terms 27 terms 28 terms 29 terms 30 terms
7378463928 7035211389 7007446263 10399858547 0 0
31 terms 32 terms 33 terms 34 terms 35 terms 36 terms
0 0 0 0 0 0
37 terms 38 terms 39 terms 40 terms
0 0 0 0
> numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
> summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset1,
nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
10 terms 11 terms 12 terms 13 terms 14 terms 15 terms
2047094369 4361229925 2811887563 4960791975 2717103497 2265582134
16 terms 17 terms 18 terms 19 terms 20 terms 21 terms
2288868136 2470815605 4989632044 4966101666 3993722223 4000594447
22 terms 23 terms 24 terms 25 terms 26 terms 27 terms
3925383715 7636913238 7714228211 7378463928 7035211389 7007446263
28 terms 29 terms 30 terms 31 terms 32 terms 33 terms
10399858547 0 0 0 0 0
34 terms 35 terms 36 terms 37 terms 38 terms 39 terms
0 0 0 0 0 0
40 terms
0
Projection direction vectors:
term 1 term 2 term 3 term 4 term 5 term 6
ilogemp -0.67134667 -0.02873846 -0.73893911 0.18766759 -0.32913203 0.79075441
ilogage -0.74114348 0.99958697 0.67377221 -0.98223260 -0.94428391 -0.61213353
term 7 term 8 term 9 term 10
ilogemp -0.73097017 0.46223139 -0.28989935 0.43839056
ilogage 0.68240941 -0.88675935 0.95705714 0.89878458
Coefficients of ridge terms:
term 1 term 2 term 3 term 4 term 5 term 6 term 7 term 8
53896.97 67906.48 33947.21 54279.37 61051.35 67225.76 65528.85 60914.63
term 9 term 10
58372.10 86302.62
>
######DATASET1, IDNUM ORDER
> order3 <- order(dataset1$idnum)
> reorder1 <- dataset1[order3,]
> pprfile.ppr <- ppr(
+ award~
+ ilogemp+ilogage,
+ data=reorder1, nterms=1, max.terms=40, optlevel=3, bass=0
+ )
> pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder1,
nterms = 1, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
1 terms 2 terms 3 terms 4 terms 5 terms 6 terms
104330711350 96448754204 82123267932 89241900763 62145006339 18715823257
7 terms 8 terms 9 terms 10 terms 11 terms 12 terms
13119857589 34779340331 30300680427 10845449181 25437895985 25390506630
13 terms 14 terms 15 terms 16 terms 17 terms 18 terms
25715475967 32977127206 33617404958 32855949359 31925878071 34135238643
19 terms 20 terms 21 terms 22 terms 23 terms 24 terms
0 0 0 0 0 0
25 terms 26 terms 27 terms 28 terms 29 terms 30 terms
0 0 0 0 0 0
31 terms 32 terms 33 terms 34 terms 35 terms 36 terms
0 0 0 0 0 0
37 terms 38 terms 39 terms 40 terms
0 0 0 0
> numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
> summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder1,
nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
10 terms 11 terms 12 terms 13 terms 14 terms 15 terms
10845449181 25437895985 25390506630 25715475967 32977127206 33617404958
16 terms 17 terms 18 terms 19 terms 20 terms 21 terms
32855949359 31925878071 34135238643 0 0 0
22 terms 23 terms 24 terms 25 terms 26 terms 27 terms
0 0 0 0 0 0
28 terms 29 terms 30 terms 31 terms 32 terms 33 terms
0 0 0 0 0 0
34 terms 35 terms 36 terms 37 terms 38 terms 39 terms
0 0 0 0 0 0
40 terms
0
Projection direction vectors:
term 1 term 2 term 3 term 4 term 5 term 6
ilogemp 0.53566733 -0.03152731 0.25486277 0.19557035 -0.35437387 0.78441650
ilogage -0.84442910 0.99950289 -0.96697723 -0.98068967 -0.93510382 -0.62023444
term 7 term 8 term 9 term 10
ilogemp -0.01470383 0.46360287 -0.27988056 -0.22789063
ilogage -0.99989189 -0.88604310 0.96003483 0.97368674
Coefficients of ridge terms:
term 1 term 2 term 3 term 4 term 5 term 6 term 7 term 8
26233.34 60989.33 34132.51 64068.54 39630.44 40275.67 28783.82 27284.49
term 9 term 10
40799.08 49967.01
######DATASET2, ORIGINAL ORDER
>
>
> pprfile.ppr <- ppr(
+ award~
+ ilogemp+ilogage,
+ data=dataset2, nterms=1, max.terms=40, optlevel=3, bass=0
+ )
> pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset2,
nterms = 1, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
1 terms 2 terms 3 terms 4 terms 5 terms 6 terms
107514555509 86364622071 76915236151 68363332758 62669868895 66108915227
7 terms 8 terms 9 terms 10 terms 11 terms 12 terms
10960310955 11415371966 8181125026 7370110042 7083942083 7695270325
13 terms 14 terms 15 terms 16 terms 17 terms 18 terms
9383626616 28487721161 25013255493 30617600484 33616699135 37629909488
19 terms 20 terms 21 terms 22 terms 23 terms 24 terms
0 0 0 0 0 0
25 terms 26 terms 27 terms 28 terms 29 terms 30 terms
0 0 0 0 0 0
31 terms 32 terms 33 terms 34 terms 35 terms 36 terms
0 0 0 0 0 0
37 terms 38 terms 39 terms 40 terms
0 0 0 0
> numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
> summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset2,
nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
11 terms 12 terms 13 terms 14 terms 15 terms 16 terms
7083942083 7695270325 9383626616 28487721161 25013255493 30617600484
17 terms 18 terms 19 terms 20 terms 21 terms 22 terms
33616699135 37629909488 0 0 0 0
23 terms 24 terms 25 terms 26 terms 27 terms 28 terms
0 0 0 0 0 0
29 terms 30 terms 31 terms 32 terms 33 terms 34 terms
0 0 0 0 0 0
35 terms 36 terms 37 terms 38 terms 39 terms 40 terms
0 0 0 0 0 0
Projection direction vectors:
term 1 term 2 term 3 term 4 term 5
ilogemp 0.740357413 -0.002414494 -0.549155789 0.189738006 -0.375109022
ilogage -0.672213435 0.999997085 0.835720000 -0.981834756 -0.926980702
term 6 term 7 term 8 term 9 term 10
ilogemp 0.884994686 0.144491176 0.445318057 -0.279893879 0.249729588
ilogage -0.465601123 -0.989506089 -0.895372452 0.960030946 -0.968315616
term 11
ilogemp -0.245536824
ilogage 0.969387264
Coefficients of ridge terms:
term 1 term 2 term 3 term 4 term 5 term 6 term 7 term 8
19688.36 46872.70 48959.17 61038.99 50865.32 41610.74 29375.41 25582.86
term 9 term 10 term 11
51523.50 34629.45 30547.80
>
######DATASET2, IDNUM ORDER
> order4 <- order(dataset2$idnum)
> reorder2 <- dataset2[order4,]
> pprfile.ppr <- ppr(
+ award~
+ ilogemp+ilogage,
+ data=reorder2, nterms=1, max.terms=40, optlevel=3, bass=0
+ )
> pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder2,
nterms = 1, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
1 terms 2 terms 3 terms 4 terms 5 terms 6 terms
104330711350 96448754204 82123267932 89241900763 62145006339 18715823257
7 terms 8 terms 9 terms 10 terms 11 terms 12 terms
13119857589 34779340331 30300680427 10845449181 25437895985 25390506630
13 terms 14 terms 15 terms 16 terms 17 terms 18 terms
25715475967 32977127206 33617404958 32855949359 31925878071 34135238643
19 terms 20 terms 21 terms 22 terms 23 terms 24 terms
0 0 0 0 0 0
25 terms 26 terms 27 terms 28 terms 29 terms 30 terms
0 0 0 0 0 0
31 terms 32 terms 33 terms 34 terms 35 terms 36 terms
0 0 0 0 0 0
37 terms 38 terms 39 terms 40 terms
0 0 0 0
> numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
> summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder2,
nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)
Goodness of fit:
10 terms 11 terms 12 terms 13 terms 14 terms 15 terms
10845449181 25437895985 25390506630 25715475967 32977127206 33617404958
16 terms 17 terms 18 terms 19 terms 20 terms 21 terms
32855949359 31925878071 34135238643 0 0 0
22 terms 23 terms 24 terms 25 terms 26 terms 27 terms
0 0 0 0 0 0
28 terms 29 terms 30 terms 31 terms 32 terms 33 terms
0 0 0 0 0 0
34 terms 35 terms 36 terms 37 terms 38 terms 39 terms
0 0 0 0 0 0
40 terms
0
Projection direction vectors:
term 1 term 2 term 3 term 4 term 5 term 6
ilogemp 0.53566733 -0.03152731 0.25486277 0.19557035 -0.35437387 0.78441650
ilogage -0.84442910 0.99950289 -0.96697723 -0.98068967 -0.93510382 -0.62023444
term 7 term 8 term 9 term 10
ilogemp -0.01470383 0.46360287 -0.27988056 -0.22789063
ilogage -0.99989189 -0.88604310 0.96003483 0.97368674
Coefficients of ridge terms:
term 1 term 2 term 3 term 4 term 5 term 6 term 7 term 8
26233.34 60989.33 34132.51 64068.54 39630.44 40275.67 28783.82 27284.49
term 9 term 10
40799.08 49967.01
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list