[Rd] Proposed Patch for poly.Rd

Duncan Murdoch murdoch.duncan at gmail.com
Mon Jul 17 12:32:15 CEST 2017


On 17/07/2017 2:57 AM, Martin Maechler wrote:
>>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>>>     on Sat, 15 Jul 2017 19:27:57 -0400 writes:
>
>     > On 15/07/2017 11:37 AM, Martin Maechler wrote:
>     >>>>>>> Marc Schwartz <marc_schwartz at me.com>
>     >>>>>>> on Fri, 14 Jul 2017 11:01:03 -0500 writes:
>     >>
>     >> >> On Jul 14, 2017, at 9:50 AM, Martin Maechler
>     >> >> <maechler at stat.math.ethz.ch> wrote:
>     >> >>
>     >> >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on Fri,
>     >> >>>>>>> 14 Jul 2017 16:30:50 +0200 writes:
>     >> >>
>     >> >>>>>>> Marc Schwartz <marc_schwartz at me.com> on Fri, 14 Jul
>     >> >>>>>>> 2017 06:57:26 -0500 writes:
>     >> >>
>     >> >>>>> On Jul 13, 2017, at 5:07 PM, Marc Schwartz
>     >> >>>>> <marc_schwartz at me.com> wrote:
>     >> >>>>>
>     >> >>>>>
>     >>>>>>> On Jul 13, 2017, at 3:37 PM, Marc Schwartz
>     >> >>>>> <marc_schwartz at me.com> wrote:
>     >>>>>>>
>     >>>>>>>
>     >> >>>>>>> On Jul 13, 2017, at 3:22 PM, Duncan Murdoch
>     >> >>>>>>> <murdoch.duncan at gmail.com> wrote:
>     >> >>>>>>>
>     >> >>>>>>> On 13/07/2017 4:08 PM, Marc Schwartz wrote:
>     >> >>>>>>>> Hi All,
>     >> >>>>>>>>
>     >> >>>>>>>> As per the discussion today on R-Help:
>     >> >>>>>>>>
>     >> >>>>>>>> https://stat.ethz.ch/pipermail/r-help/2017-July/448132.html
>     >> >>>>>>>>
>     >> >>>>>>>> I am attaching a proposed patch for poly.Rd to
>     >> >>>>>>>> provide clarifying wording relative to naming the
>     >> >>>>>>>> 'degree' argument explicitly, in the case where the
>     >> >>>>>>>> 'x' argument is a matrix, rather than a vector.
>     >> >>>>>>>>
>     >> >>>>>>>> This is based upon the svn trunk version of
>     >> >>>>>>>> poly.Rd.
>     >> >>>>>>>
>     >> >>>>>>> I don't think this is the right fix.  The use of the
>     >> >>>>>>> unnamed 2nd arg as degree happens whether the first
>     >> >>>>>>> arg is a matrix or not.
>     >> >>>>>>>
>     >> >>>>>>> I didn't read the whole thread in detail, but it
>     >> >>>>>>> appears there's a bug somewhere, in the report or in
>     >> >>>>>>> the poly() code or in the plsr() code. That bug
>     >> >>>>>>> should be reported on the bug list if it turns out
>     >> >>>>>>> to be in base R, and to the package maintainer if it
>     >> >>>>>>> is in plsr().
>     >> >>>>>>>
>     >> >>>>>>> Duncan Murdoch
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Duncan,
>     >>>>>>>
>     >>>>>>> Thanks for your reply. You only really need to read that
>     >> >>>>>>> last post in the thread linked to above.
>     >>>>>>>
>     >>>>>>> I won't deny the possibility of a bug in poly(), relative
>     >> >>>>>>> to the handling of 'x' as a matrix. The behavior
>     >> >>>>>>> occurs in the poly() function in a pure stand alone
>     >> >>>>>>> fashion, without the need for plsr():
>     >>>>>>>
>     >>>>>>> x1 <- runif(20)
>     >>>>>>> x2 <- runif(20)
>     >>>>>>> mx <- cbind(x1, x2)
>     >>>>>>>
>     >> >>>>>
>     >> >>>>> <snip>
>     >> >>>>>
>     >> >>>>> Duncan,
>     >> >>>>>
>     >> >>>>> Tracing through the code for poly() using debug once
>     >> >>>>> with:
>     >> >>>>>
>     >> >>>>> poly(mx, 2)
>     >> >>>>>
>     >> >>>>> and then with:
>     >> >>>>>
>     >> >>>>> poly(mx, degree = 2)
>     >> >>>>>
>     >> >>>>> there is a difference in the transformation of 'mx'
>     >> >>>>> internally by the use of:
>     >> >>>>>
>     >> >>>>> if (is.matrix(x)) { m <-
>     >> >>>>> unclass(as.data.frame(cbind(x, ...)))
>     >> >>>>> return(do.call(polym, c(m, degree = degree, raw = raw,
>     >> >>>>> list(coefs = coefs)))) }
>     >> >>>>>
>     >> >>>>>
>     >> >>>>> In the first case, 'mx' ends up being transformed to:
>     >> >>>>>
>     >> >>>>> Browse[2]> m $x1 [1] 0.99056941 0.13953093 0.38965567
>     >> >>>>> 0.35353514 0.90838486 0.97552474 [7] 0.01135743
>     >> >>>>> 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
>     >> >>>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157
>     >> >>>>> 0.31164777 0.81694822 [19] 0.58641410 0.08858699
>     >> >>>>>
>     >> >>>>> $x2 [1] 0.6628658 0.9221436 0.3162418 0.8494452
>     >> >>>>> 0.4665010 0.3403719 [7] 0.4040692 0.4916650 0.9091161
>     >> >>>>> 0.2956006 0.3454689 0.3331070 [13] 0.8788974 0.5614636
>     >> >>>>> 0.7794396 0.2304009 0.6566537 0.6875646 [19] 0.5110733
>     >> >>>>> 0.4122336
>     >> >>>>>
>     >> >>>>> $V3 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>     >> >>>>>
>     >> >>>>> attr(,"row.names") [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
>     >> >>>>> 14 15 16 17 18 19 20
>     >> >>>>>
>     >> >>>>> Thus, when do.call() is used, m$V3 is passed as the
>     >> >>>>> 'x' argument on the third iteration, essentially
>     >> >>>>> resulting in:
>     >> >>>>>
>     >>>>>>> polym(rep(2, 20), degree = 2) Error in poly(dots[[1L]],
>     >> >>>>> degree, raw = raw, simple = raw && nd > 1) : 'degree'
>     >> >>>>> must be less than number of unique points
>     >> >>>>>
>     >> >>>>>
>     >> >>>>> Note also that in this case, 'dots', which is the
>     >> >>>>> result of using list(...) on the initial call, is:
>     >> >>>>>
>     >> >>>>> Browse[2]> dots [[1]] [1] 2
>     >> >>>>>
>     >> >>>>>
>     >> >>>>> In the second case:
>     >> >>>>>
>     >> >>>>> Browse[2]> m $x1 [1] 0.99056941 0.13953093 0.38965567
>     >> >>>>> 0.35353514 0.90838486 0.97552474 [7] 0.01135743
>     >> >>>>> 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
>     >> >>>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157
>     >> >>>>> 0.31164777 0.81694822 [19] 0.58641410 0.08858699
>     >> >>>>>
>     >> >>>>> $x2 [1] 0.6628658 0.9221436 0.3162418 0.8494452
>     >> >>>>> 0.4665010 0.3403719 [7] 0.4040692 0.4916650 0.9091161
>     >> >>>>> 0.2956006 0.3454689 0.3331070 [13] 0.8788974 0.5614636
>     >> >>>>> 0.7794396 0.2304009 0.6566537 0.6875646 [19] 0.5110733
>     >> >>>>> 0.4122336
>     >> >>>>>
>     >> >>>>> attr(,"row.names") [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
>     >> >>>>> 14 15 16 17 18 19 20
>     >> >>>>>
>     >> >>>>>
>     >> >>>>> So, there is no m$V3.
>     >> >>>>>
>     >> >>>>> Note also that 'dots' ends up being:
>     >> >>>>>
>     >> >>>>> Browse[2]> dots list()
>     >> >>>>>
>     >> >>>>>
>     >> >>>>> In both cases, 'degree' is indeed 2, but the result of
>     >> >>>>> 'list(...)' on the initial function call is quite
>     >> >>>>> different.
>     >> >>>>>
>     >> >>>>> So, I may be hypo-caffeinated, but if there is a bug
>     >> >>>>> here, it may be due to the way in which cbind() is
>     >> >>>>> being called in the code above, where the three dots
>     >> >>>>> are being used?
>     >> >>>>>
>     >> >>>>> I can replicate the presumably correct behavior by
>     >> >>>>> using:
>     >> >>>>>
>     >> >>>>> m <- unclass(as.data.frame(cbind(x)))
>     >> >>>>>
>     >> >>>>> instead of:
>     >> >>>>>
>     >> >>>>> m <- unclass(as.data.frame(cbind(x, ...)))
>     >> >>>>>
>     >> >>>>> But I am not sure if removing the three dots in the
>     >> >>>>> cbind() call may have other unintended consequences.
>     >> >>>>>
>     >> >>>>> Regards,
>     >> >>>>>
>     >> >>>>> Marc
>     >> >>
>     >> >>
>     >> >>>> Duncan,
>     >> >>
>     >> >>>> Some additional information here.  Reviewing the source
>     >> >>>> code for the function in SVN:
>     >> >>
>     >> >>>> https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R
>     >> >>
>     >> >>>> there is a relevant comment in the code:
>     >> >>
>     >> >>>> if(is.matrix(x)) { ## FIXME: fails when combined with
>     >> >>>> 'unnamed degree' above m <-
>     >> >>>> unclass(as.data.frame(cbind(x, ...)))
>     >> >>>> return(do.call(polym, c(m, degree = degree, raw = raw,
>     >> >>>> list(coefs=coefs)))) }
>     >> >>
>     >> >>
>     >> >>>> A version review would suggest that the above comment
>     >> >>>> was added to the code back in 2015.
>     >> >>
>     >> >>> Yes, by me, possibly here :
>     >> >>
>     >> >>> $ svn log -v -c68727
>     >> >>> ------------------------------------------------------------------------
>     >> >>> r68727 | maechler | 2015-07-23 16:14:59 +0200 (Thu, 23
>     >> >>> Jul 2015) | 1 line Changed paths: M /trunk/doc/NEWS.Rd M
>     >> >>> /trunk/src/library/stats/R/contr.poly.R M
>     >> >>> /trunk/src/library/stats/man/poly.Rd M
>     >> >>> /trunk/tests/Examples/stats-Ex.Rout.save M
>     >> >>> /trunk/tests/reg-tests-1c.R
>     >> >>
>     >> >>> poly(), polym() now work better notably for prediction
>     >> >>> ------------------------------------------------------------------------
>     >> >>> $ svn-diffB -c68727 doc/NEWS.Rd Index: doc/NEWS.Rd
>     >> >>> ===================================================================
>     >> >>> 126a127,133
>     >> >>>>
>     >> >>>> \item \code{polym()} gains a \code{coefs = NULL}
>     >> >>>> argument and returns class \code{"poly"} just like
>     >> >>>> \code{poly()} which gets a new \code{simple=FALSE}
>     >> >>>> option.  They now lead to correct \code{predict()}ions,
>     >> >>>> e.g., on subsets of the original data.  %% see
>     >> >>>> https://stat.ethz.ch/pipermail/r-devel/2015-July/071532.html
>     >> >>
>     >> >>
>     >> >>>> So it would appear that the behavior being discussed
>     >> >>>> here is known.
>     >> >>
>     >> >>> Indeed!  I remember to have spent quite a few hours with
>     >> >>> the code and its different uses before committing that
>     >> >>> patch.
>     >> >>
>     >> >>>> I am still confused by the need for the '...' in the
>     >> >>>> call to cbind(), which as far as I can tell, has been
>     >> >>>> in the code at least back to 2003, when the poly() code
>     >> >>>> was split from base.
>     >> >>
>     >> >>>> I am not sure why one would want to pass on other '...'
>     >> >>>> arguments to cbind(), but I am presumably missing
>     >> >>>> something here.
>     >> >>
>     >> >>> Yes, I think passing the '...' is important there...
>     >> >>> OTOH, I'm almost sure that I wrote the 'FIXME' because I
>     >> >>> thought one should be able to do things better.  So, I'm
>     >> >>> happy to e-talk to you about how to get rid of the FIXME
>     >> >>> and still remain back-compatible: Notably with the
>     >> >>> paragraph in ?poly |> Details:
>     >> >>> |>
>     >> >>> |> Although formally ‘degree’ should be named (as it
>     >> >>> follows ‘...’), |> an unnamed second argument of length
>     >> >>> 1 will be interpreted as the |> degree, such that
>     >> >>> ‘poly(x, 3)’ can be used in formulas.
>     >> >>
>     >> >> As a matter of fact, a patch seems very simple, and I am
>     >> >> testing it now.
>     >> >>
>     >> >> Won't have much more time today, but will return "on this
>     >> >> channel" later, maybe tomorrow.
>     >> >>
>     >> >> Martin
>     >>
>     >>
>     >> > Martin,
>     >> > Thanks for taking the time to look at this!
>     >>
>     >> > Marc
>     >>
>     >> Duncan had in the mean time filed a bug report about this,
>     --> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17310
>     >> but I had fixed the issue even before seeing the PR.
>     >> [currently fixed in R-devel only (svn r 72919)]
>
>     > I wrote to you the next day, when Marc pointed out the FIXME comment.
>     > Did you not receive my message?
>
>     > Duncan Murdoch
>
> I'm sorry Duncan,
> there must have been messages crossing each other, possibly
> delayed on this end, where some (including me) are using a new
> spam/virus filtering service.
> Yes, I saw that too.. but also quite a bit *after* having
> replied on R-devel that I was the author of the FIXME and that I
> was going to look into the issue...
> ... and then I did look into the issue instead of checking my
> (almost always much too many) e-mails.
>
> But there's no problem about this, right?
> I'm sorry if I gave the impression in some way.
> It's good to have a bug report that we can quickly close isn't it?

No, there's no problem.  I was just concerned because it sounded as 
though you hadn't received it; if messages were getting lost, that would 
be a problem.

Duncan Murdoch



More information about the R-devel mailing list