[R] ppr, number of terms, and data ordering

david.beede@mail.doc.gov david.beede at mail.doc.gov
Wed Jun 6 18:11:48 CEST 2001


Dear R listers --

I have several questions about using the ppr command in the modreg module.

I discovered -- quite by accident -- that if I re-order the data, I obtain different
results.  The output below shows what I mean.  I have two datasets (dataset1 and dataset2)
that are identical (tested using proc compare in SAS) except for the fact that the records
are in different order.  Below I have pasted in the results from running ppr on the two
data sets in their original order (first and third sets of results below) and running
ppr after sorting the datasets by idnum (second and third sets of results below).

At first I thought that perhaps the regression parameters are different but the underlying
results are equivalent but predicted values are significantly different.  I tried
increasing the bass parameter, thinking that perhaps I was overfitting the data,
but the differences in the regression parameters remained.  Finally, I originally had lots
of other RHS variables, including indicator variables; stripping those variables out
did not change my findings, as shown below.

My first question is:  is there a recommended way to sort the data before running ppr?
In the meantime, I'll try sorting by my two continuous RHS variables to see if it makes
a difference -- not a definitive answer but it may be suggestive.


My second question is:  is my method for deciding on the number of terms in the regression
okay?  What I am doing is first running ppr with a large maximum number of terms, then
finding the number of terms that minimizes the goodness-of-fit statistic.  Looking at the
cpus example in the section of MASS that deals with ppr (pp. 293-294), it is unclear why
eight terms were finally chosen, when using ten terms yields a lower goodness-of-fit
statistic.

Finally, in the same example in MASS, where does the test.cpus() function come from?  I
couldn't find it in the MASS table of contents on CRAN.

Thanks in advance for any help you can give!



######DATASET1, ORIGINAL ORDER

> pprfile.ppr <- ppr(
+                award~
+                ilogemp+ilogage,
+                data=dataset1, nterms=1, max.terms=40, optlevel=3, bass=0
+      )
>     pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset1,
    nterms = 1, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
    1 terms     2 terms     3 terms     4 terms     5 terms     6 terms
94153840890 97278012020 86500905681 55028690369 53258726650 48006301832
    7 terms     8 terms     9 terms    10 terms    11 terms    12 terms
64844079166 44746090412  4502608255  2047094369  4361229925  2811887563
   13 terms    14 terms    15 terms    16 terms    17 terms    18 terms
 4960791975  2717103497  2265582134  2288868136  2470815605  4989632044
   19 terms    20 terms    21 terms    22 terms    23 terms    24 terms
 4966101666  3993722223  4000594447  3925383715  7636913238  7714228211
   25 terms    26 terms    27 terms    28 terms    29 terms    30 terms
 7378463928  7035211389  7007446263 10399858547           0           0
   31 terms    32 terms    33 terms    34 terms    35 terms    36 terms
          0           0           0           0           0           0
   37 terms    38 terms    39 terms    40 terms
          0           0           0           0
>     numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
>     summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset1,
    nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
   10 terms    11 terms    12 terms    13 terms    14 terms    15 terms
 2047094369  4361229925  2811887563  4960791975  2717103497  2265582134
   16 terms    17 terms    18 terms    19 terms    20 terms    21 terms
 2288868136  2470815605  4989632044  4966101666  3993722223  4000594447
   22 terms    23 terms    24 terms    25 terms    26 terms    27 terms
 3925383715  7636913238  7714228211  7378463928  7035211389  7007446263
   28 terms    29 terms    30 terms    31 terms    32 terms    33 terms
10399858547           0           0           0           0           0
   34 terms    35 terms    36 terms    37 terms    38 terms    39 terms
          0           0           0           0           0           0
   40 terms
          0

Projection direction vectors:
        term 1      term 2      term 3      term 4      term 5      term 6
ilogemp -0.67134667 -0.02873846 -0.73893911  0.18766759 -0.32913203  0.79075441
ilogage -0.74114348  0.99958697  0.67377221 -0.98223260 -0.94428391 -0.61213353
        term 7      term 8      term 9      term 10
ilogemp -0.73097017  0.46223139 -0.28989935  0.43839056
ilogage  0.68240941 -0.88675935  0.95705714  0.89878458

Coefficients of ridge terms:
  term 1   term 2   term 3   term 4   term 5   term 6   term 7   term 8
53896.97 67906.48 33947.21 54279.37 61051.35 67225.76 65528.85 60914.63
  term 9  term 10
58372.10 86302.62
>
######DATASET1, IDNUM ORDER

> order3 <- order(dataset1$idnum)
> reorder1 <- dataset1[order3,]
> pprfile.ppr <- ppr(
+                award~
+                ilogemp+ilogage,
+                data=reorder1, nterms=1, max.terms=40, optlevel=3, bass=0
+      )
>     pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder1,
    nterms = 1, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
     1 terms      2 terms      3 terms      4 terms      5 terms      6 terms
104330711350  96448754204  82123267932  89241900763  62145006339  18715823257
     7 terms      8 terms      9 terms     10 terms     11 terms     12 terms
 13119857589  34779340331  30300680427  10845449181  25437895985  25390506630
    13 terms     14 terms     15 terms     16 terms     17 terms     18 terms
 25715475967  32977127206  33617404958  32855949359  31925878071  34135238643
    19 terms     20 terms     21 terms     22 terms     23 terms     24 terms
           0            0            0            0            0            0
    25 terms     26 terms     27 terms     28 terms     29 terms     30 terms
           0            0            0            0            0            0
    31 terms     32 terms     33 terms     34 terms     35 terms     36 terms
           0            0            0            0            0            0
    37 terms     38 terms     39 terms     40 terms
           0            0            0            0
>     numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
>     summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder1,
    nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
   10 terms    11 terms    12 terms    13 terms    14 terms    15 terms
10845449181 25437895985 25390506630 25715475967 32977127206 33617404958
   16 terms    17 terms    18 terms    19 terms    20 terms    21 terms
32855949359 31925878071 34135238643           0           0           0
   22 terms    23 terms    24 terms    25 terms    26 terms    27 terms
          0           0           0           0           0           0
   28 terms    29 terms    30 terms    31 terms    32 terms    33 terms
          0           0           0           0           0           0
   34 terms    35 terms    36 terms    37 terms    38 terms    39 terms
          0           0           0           0           0           0
   40 terms
          0

Projection direction vectors:
        term 1      term 2      term 3      term 4      term 5      term 6
ilogemp  0.53566733 -0.03152731  0.25486277  0.19557035 -0.35437387  0.78441650
ilogage -0.84442910  0.99950289 -0.96697723 -0.98068967 -0.93510382 -0.62023444
        term 7      term 8      term 9      term 10
ilogemp -0.01470383  0.46360287 -0.27988056 -0.22789063
ilogage -0.99989189 -0.88604310  0.96003483  0.97368674

Coefficients of ridge terms:
  term 1   term 2   term 3   term 4   term 5   term 6   term 7   term 8
26233.34 60989.33 34132.51 64068.54 39630.44 40275.67 28783.82 27284.49
  term 9  term 10
40799.08 49967.01


######DATASET2, ORIGINAL ORDER
>
>
> pprfile.ppr <-  ppr(
+                award~
+                ilogemp+ilogage,
+                data=dataset2, nterms=1, max.terms=40, optlevel=3, bass=0
+      )
>     pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset2,
    nterms = 1, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
     1 terms      2 terms      3 terms      4 terms      5 terms      6 terms
107514555509  86364622071  76915236151  68363332758  62669868895  66108915227
     7 terms      8 terms      9 terms     10 terms     11 terms     12 terms
 10960310955  11415371966   8181125026   7370110042   7083942083   7695270325
    13 terms     14 terms     15 terms     16 terms     17 terms     18 terms
  9383626616  28487721161  25013255493  30617600484  33616699135  37629909488
    19 terms     20 terms     21 terms     22 terms     23 terms     24 terms
           0            0            0            0            0            0
    25 terms     26 terms     27 terms     28 terms     29 terms     30 terms
           0            0            0            0            0            0
    31 terms     32 terms     33 terms     34 terms     35 terms     36 terms
           0            0            0            0            0            0
    37 terms     38 terms     39 terms     40 terms
           0            0            0            0
>     numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
>     summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = dataset2,
    nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
   11 terms    12 terms    13 terms    14 terms    15 terms    16 terms
 7083942083  7695270325  9383626616 28487721161 25013255493 30617600484
   17 terms    18 terms    19 terms    20 terms    21 terms    22 terms
33616699135 37629909488           0           0           0           0
   23 terms    24 terms    25 terms    26 terms    27 terms    28 terms
          0           0           0           0           0           0
   29 terms    30 terms    31 terms    32 terms    33 terms    34 terms
          0           0           0           0           0           0
   35 terms    36 terms    37 terms    38 terms    39 terms    40 terms
          0           0           0           0           0           0

Projection direction vectors:
        term 1       term 2       term 3       term 4       term 5
ilogemp  0.740357413 -0.002414494 -0.549155789  0.189738006 -0.375109022
ilogage -0.672213435  0.999997085  0.835720000 -0.981834756 -0.926980702
        term 6       term 7       term 8       term 9       term 10
ilogemp  0.884994686  0.144491176  0.445318057 -0.279893879  0.249729588
ilogage -0.465601123 -0.989506089 -0.895372452  0.960030946 -0.968315616
        term 11
ilogemp -0.245536824
ilogage  0.969387264

Coefficients of ridge terms:
  term 1   term 2   term 3   term 4   term 5   term 6   term 7   term 8
19688.36 46872.70 48959.17 61038.99 50865.32 41610.74 29375.41 25582.86
  term 9  term 10  term 11
51523.50 34629.45 30547.80
>

######DATASET2, IDNUM ORDER

> order4 <- order(dataset2$idnum)
> reorder2 <- dataset2[order4,]
> pprfile.ppr <- ppr(
+                award~
+                ilogemp+ilogage,
+                data=reorder2, nterms=1, max.terms=40, optlevel=3, bass=0
+      )
>     pprfile.ppr
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder2,
    nterms = 1, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
     1 terms      2 terms      3 terms      4 terms      5 terms      6 terms
104330711350  96448754204  82123267932  89241900763  62145006339  18715823257
     7 terms      8 terms      9 terms     10 terms     11 terms     12 terms
 13119857589  34779340331  30300680427  10845449181  25437895985  25390506630
    13 terms     14 terms     15 terms     16 terms     17 terms     18 terms
 25715475967  32977127206  33617404958  32855949359  31925878071  34135238643
    19 terms     20 terms     21 terms     22 terms     23 terms     24 terms
           0            0            0            0            0            0
    25 terms     26 terms     27 terms     28 terms     29 terms     30 terms
           0            0            0            0            0            0
    31 terms     32 terms     33 terms     34 terms     35 terms     36 terms
           0            0            0            0            0            0
    37 terms     38 terms     39 terms     40 terms
           0            0            0            0
>     numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn > 0])
>     summary(update(pprfile.ppr,nterms=numterm))
Call:
ppr.formula(formula = award ~ ilogemp + ilogage, data = reorder2,
    nterms = numterm, max.terms = 40, optlevel = 3, bass = 0)

Goodness of fit:
   10 terms    11 terms    12 terms    13 terms    14 terms    15 terms
10845449181 25437895985 25390506630 25715475967 32977127206 33617404958
   16 terms    17 terms    18 terms    19 terms    20 terms    21 terms
32855949359 31925878071 34135238643           0           0           0
   22 terms    23 terms    24 terms    25 terms    26 terms    27 terms
          0           0           0           0           0           0
   28 terms    29 terms    30 terms    31 terms    32 terms    33 terms
          0           0           0           0           0           0
   34 terms    35 terms    36 terms    37 terms    38 terms    39 terms
          0           0           0           0           0           0
   40 terms
          0

Projection direction vectors:
        term 1      term 2      term 3      term 4      term 5      term 6
ilogemp  0.53566733 -0.03152731  0.25486277  0.19557035 -0.35437387  0.78441650
ilogage -0.84442910  0.99950289 -0.96697723 -0.98068967 -0.93510382 -0.62023444
        term 7      term 8      term 9      term 10
ilogemp -0.01470383  0.46360287 -0.27988056 -0.22789063
ilogage -0.99989189 -0.88604310  0.96003483  0.97368674

Coefficients of ridge terms:
  term 1   term 2   term 3   term 4   term 5   term 6   term 7   term 8
26233.34 60989.33 34132.51 64068.54 39630.44 40275.67 28783.82 27284.49
  term 9  term 10
40799.08 49967.01

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list