[Rd] Improvements (?) in stats::poly and stats::polym.

Martin Maechler maechler at stat.math.ethz.ch
Fri Jul 17 18:00:28 CEST 2015


Dear Keith,

>>>>>   <Keith.Jewell at campdenbri.co.uk>
>>>>>     on Thu, 16 Jul 2015 08:58:11 +0000 writes:

    > Dear R Core Team,
    > Last week I made a post to the R-help mailing list
    > “predict.poly for multivariate data”
    > <https://stat.ethz.ch/pipermail/r-help/2015-July/430311.html>
    > but it has had no responses so I’m sending this to the
    > email address of package:stats maintainer. Please feel
    > free to tell me that this is inappropriate.

Asking R Core in your case is ok ...
{ though still slightly "sub optimal" (but *not* "inappropriate"!):
  Ideallly you'd have followed the posting guide
  (http://www.r-project.org/posting-guide.html) here,
  namely to send your original post to R-devel instead of R-help.
  Then it would have been noticed by me and most probably
  several other R core members ...
}

    > IMHO the reproducible code I presented in that post:
    > #############
    > library(datasets)
    > alm <- lm(stack.loss ~ poly(Air.Flow, Water.Temp, degree=3), stackloss)
    > alm$fitted.values[1:10]   # "correct" prediction values [1:10]
    > predict(alm, stackloss)[1:10]  #  gives correct values
    > predict(alm, stackloss[1:10,]) #  gives wrong values
    > #########
    > ... clearly demonstrates something wrong, the two predicts should not differ.

    > I hesitate to call it a bug, it might be viewed as inappropriate usage. But it's easy to get wrong answers, fairly small changes to poly and polym correct the wrongness, and I think the changes are backwards compatible. Perhaps appending the altered codes made the R-help post too long for easy comprehension, I attach them to this email.

Thank you!

I had started to look at your R-help post and noticed that you
changed the *printout* of the R functions, instead of the
source 
The current development version of that part of the R
source code is always at
  https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R

and if you look carefully, you see that there are comments in
the sources that are lost in the process (of parsing,
byte-compiling, saving in binary, ....),
but never mind:
you've marked your changes well and I can use your version to
modify the sources.

>From what I've understood, the changes make much sense and look
good; and if no problem surfaces should make it into R - with an
acknowledgement to you, of course.

Thank you, indeed,
Martin
-------

Part 2:

    > While I'm writing, for didactic purposes I've sometimes
    > wanted to express the orthogonal polynomials in the
    > conventional form: 
    > 		   b0.x^0 + b1.x^1 + b2.x^2 + ...
    > ... rather than in terms of " the centering and
    > normalization constants used in constructing the
    > orthogonal polynomials". 

    > Over the years in the various R forums others have made
    > similar requests to be told that one can: 

    > (a) calculate the polynomials using poly.predict(), so the
    >     conventional form coefficients aren't needed; 
    > (b) see the algorithm in the code and/or Kennedy & Gentle (1980, pp. 343–4).


    > (a) doesn't meet my didactic needs.

    > With respect to (b) I could see how to calculate polynomials for given x value but I just got lost in the algebra trying to deduce the conventional form coefficients :-{

    > Kennedy & Gentle refer to "solving for x in p(x)" which to
    > my simple mind suggested `lm' and led to the truly
    > horrible approach implemented in the attached 
    > poly-orth2raw.R. 

    > I fully accept that there must be a better, more direct,
    > way to deduce the conventional form coefficients from the
    > centering and normalisation constants but I can't work it
    > out, and this approach seems to work.

I'm not really commenting on the above issue, and notably your
orth2raw implementation at all.
My vague recollection would be that indeed, it has been sometimes
desirable, if only for didactical reasons, to better explain the
meaning of the orthogonal polynomial basis.

OTOH, notably if you think of high degree polynomials(10 already may be
"high" here), it can even be "dangerous" to hand down the
coefficients wrt to the (1, x, x^2, .., x^p) basis to the end
user, because even using them in prediction may be numerical
quite unstable {but then one, me included, would argue that
using degree 10 polynomials is typically nonsense and should be
replaced by using regression/smoothing splines}...

    > With best regards,

    > Keith Jewell – Head of Statistics Group
    > Campden BRI Group

    > Tel       + 44(0)1386 842055
    > Email   keith.jewell at campdenbri.co.uk
    > Web    www.campdenbri.co.uk
    > Site      Station Road, Chipping Campden, Gloucestershire, GL55 6LD, UK

    > [............ ca 20 lines of legal gibberish + anti-virus ad .. ]

    > ____________________________________________________________
    > x[DELETED ATTACHMENT poly.R, Untyped binary data]
    > x[DELETED ATTACHMENT polym.R, Untyped binary data]
    > x[DELETED ATTACHMENT orth2raw.R, Untyped binary data]



More information about the R-devel mailing list