[Rd] Proposed Patch for poly.Rd
Martin Maechler
maechler at stat.math.ethz.ch
Sat Jul 15 17:37:16 CEST 2017
>>>>> Marc Schwartz <marc_schwartz at me.com>
>>>>> on Fri, 14 Jul 2017 11:01:03 -0500 writes:
>> On Jul 14, 2017, at 9:50 AM, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
>>
>>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on Fri,
>>>>>>> 14 Jul 2017 16:30:50 +0200 writes:
>>
>>>>>>> Marc Schwartz <marc_schwartz at me.com> on Fri, 14 Jul
>>>>>>> 2017 06:57:26 -0500 writes:
>>
>>>>> On Jul 13, 2017, at 5:07 PM, Marc Schwartz
>>>>> <marc_schwartz at me.com> wrote:
>>>>>
>>>>>
>>>>> On Jul 13, 2017, at 3:37 PM, Marc Schwartz
>>>>> <marc_schwartz at me.com> wrote:
>>>>>
>>>>>
>>>>>>> On Jul 13, 2017, at 3:22 PM, Duncan Murdoch
>>>>>>> <murdoch.duncan at gmail.com> wrote:
>>>>>>>
>>>>>>> On 13/07/2017 4:08 PM, Marc Schwartz wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> As per the discussion today on R-Help:
>>>>>>>>
>>>>>>>> https://stat.ethz.ch/pipermail/r-help/2017-July/448132.html
>>>>>>>>
>>>>>>>> I am attaching a proposed patch for poly.Rd to
>>>>>>>> provide clarifying wording relative to naming the
>>>>>>>> 'degree' argument explicitly, in the case where the
>>>>>>>> 'x' argument is a matrix, rather than a vector.
>>>>>>>>
>>>>>>>> This is based upon the svn trunk version of
>>>>>>>> poly.Rd.
>>>>>>>
>>>>>>> I don't think this is the right fix. The use of the
>>>>>>> unnamed 2nd arg as degree happens whether the first
>>>>>>> arg is a matrix or not.
>>>>>>>
>>>>>>> I didn't read the whole thread in detail, but it
>>>>>>> appears there's a bug somewhere, in the report or in
>>>>>>> the poly() code or in the plsr() code. That bug
>>>>>>> should be reported on the bug list if it turns out
>>>>>>> to be in base R, and to the package maintainer if it
>>>>>>> is in plsr().
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>> Duncan,
>>>>>
>>>>> Thanks for your reply. You only really need to read that
>>>>>>> last post in the thread linked to above.
>>>>>
>>>>> I won't deny the possibility of a bug in poly(), relative
>>>>>>> to the handling of 'x' as a matrix. The behavior
>>>>>>> occurs in the poly() function in a pure stand alone
>>>>>>> fashion, without the need for plsr():
>>>>>
>>>>> x1 <- runif(20)
>>>>> x2 <- runif(20)
>>>>> mx <- cbind(x1, x2)
>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>> Duncan,
>>>>>
>>>>> Tracing through the code for poly() using debug once
>>>>> with:
>>>>>
>>>>> poly(mx, 2)
>>>>>
>>>>> and then with:
>>>>>
>>>>> poly(mx, degree = 2)
>>>>>
>>>>> there is a difference in the transformation of 'mx'
>>>>> internally by the use of:
>>>>>
>>>>> if (is.matrix(x)) { m <-
>>>>> unclass(as.data.frame(cbind(x, ...)))
>>>>> return(do.call(polym, c(m, degree = degree, raw = raw,
>>>>> list(coefs = coefs)))) }
>>>>>
>>>>>
>>>>> In the first case, 'mx' ends up being transformed to:
>>>>>
>>>>> Browse[2]> m $x1 [1] 0.99056941 0.13953093 0.38965567
>>>>> 0.35353514 0.90838486 0.97552474 [7] 0.01135743
>>>>> 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
>>>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157
>>>>> 0.31164777 0.81694822 [19] 0.58641410 0.08858699
>>>>>
>>>>> $x2 [1] 0.6628658 0.9221436 0.3162418 0.8494452
>>>>> 0.4665010 0.3403719 [7] 0.4040692 0.4916650 0.9091161
>>>>> 0.2956006 0.3454689 0.3331070 [13] 0.8788974 0.5614636
>>>>> 0.7794396 0.2304009 0.6566537 0.6875646 [19] 0.5110733
>>>>> 0.4122336
>>>>>
>>>>> $V3 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>>>
>>>>> attr(,"row.names") [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
>>>>> 14 15 16 17 18 19 20
>>>>>
>>>>> Thus, when do.call() is used, m$V3 is passed as the
>>>>> 'x' argument on the third iteration, essentially
>>>>> resulting in:
>>>>>
>>>>> polym(rep(2, 20), degree = 2) Error in poly(dots[[1L]],
>>>>> degree, raw = raw, simple = raw && nd > 1) : 'degree'
>>>>> must be less than number of unique points
>>>>>
>>>>>
>>>>> Note also that in this case, 'dots', which is the
>>>>> result of using list(...) on the initial call, is:
>>>>>
>>>>> Browse[2]> dots [[1]] [1] 2
>>>>>
>>>>>
>>>>> In the second case:
>>>>>
>>>>> Browse[2]> m $x1 [1] 0.99056941 0.13953093 0.38965567
>>>>> 0.35353514 0.90838486 0.97552474 [7] 0.01135743
>>>>> 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973
>>>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157
>>>>> 0.31164777 0.81694822 [19] 0.58641410 0.08858699
>>>>>
>>>>> $x2 [1] 0.6628658 0.9221436 0.3162418 0.8494452
>>>>> 0.4665010 0.3403719 [7] 0.4040692 0.4916650 0.9091161
>>>>> 0.2956006 0.3454689 0.3331070 [13] 0.8788974 0.5614636
>>>>> 0.7794396 0.2304009 0.6566537 0.6875646 [19] 0.5110733
>>>>> 0.4122336
>>>>>
>>>>> attr(,"row.names") [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
>>>>> 14 15 16 17 18 19 20
>>>>>
>>>>>
>>>>> So, there is no m$V3.
>>>>>
>>>>> Note also that 'dots' ends up being:
>>>>>
>>>>> Browse[2]> dots list()
>>>>>
>>>>>
>>>>> In both cases, 'degree' is indeed 2, but the result of
>>>>> 'list(...)' on the initial function call is quite
>>>>> different.
>>>>>
>>>>> So, I may be hypo-caffeinated, but if there is a bug
>>>>> here, it may be due to the way in which cbind() is
>>>>> being called in the code above, where the three dots
>>>>> are being used?
>>>>>
>>>>> I can replicate the presumably correct behavior by
>>>>> using:
>>>>>
>>>>> m <- unclass(as.data.frame(cbind(x)))
>>>>>
>>>>> instead of:
>>>>>
>>>>> m <- unclass(as.data.frame(cbind(x, ...)))
>>>>>
>>>>> But I am not sure if removing the three dots in the
>>>>> cbind() call may have other unintended consequences.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Marc
>>
>>
>>>> Duncan,
>>
>>>> Some additional information here. Reviewing the source
>>>> code for the function in SVN:
>>
>>>> https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R
>>
>>>> there is a relevant comment in the code:
>>
>>>> if(is.matrix(x)) { ## FIXME: fails when combined with
>>>> 'unnamed degree' above m <-
>>>> unclass(as.data.frame(cbind(x, ...)))
>>>> return(do.call(polym, c(m, degree = degree, raw = raw,
>>>> list(coefs=coefs)))) }
>>
>>
>>>> A version review would suggest that the above comment
>>>> was added to the code back in 2015.
>>
>>> Yes, by me, possibly here :
>>
>>> $ svn log -v -c68727
>>> ------------------------------------------------------------------------
>>> r68727 | maechler | 2015-07-23 16:14:59 +0200 (Thu, 23
>>> Jul 2015) | 1 line Changed paths: M /trunk/doc/NEWS.Rd M
>>> /trunk/src/library/stats/R/contr.poly.R M
>>> /trunk/src/library/stats/man/poly.Rd M
>>> /trunk/tests/Examples/stats-Ex.Rout.save M
>>> /trunk/tests/reg-tests-1c.R
>>
>>> poly(), polym() now work better notably for prediction
>>> ------------------------------------------------------------------------
>>> $ svn-diffB -c68727 doc/NEWS.Rd Index: doc/NEWS.Rd
>>> ===================================================================
>>> 126a127,133
>>>>
>>>> \item \code{polym()} gains a \code{coefs = NULL}
>>>> argument and returns class \code{"poly"} just like
>>>> \code{poly()} which gets a new \code{simple=FALSE}
>>>> option. They now lead to correct \code{predict()}ions,
>>>> e.g., on subsets of the original data. %% see
>>>> https://stat.ethz.ch/pipermail/r-devel/2015-July/071532.html
>>
>>
>>>> So it would appear that the behavior being discussed
>>>> here is known.
>>
>>> Indeed! I remember to have spent quite a few hours with
>>> the code and its different uses before committing that
>>> patch.
>>
>>>> I am still confused by the need for the '...' in the
>>>> call to cbind(), which as far as I can tell, has been
>>>> in the code at least back to 2003, when the poly() code
>>>> was split from base.
>>
>>>> I am not sure why one would want to pass on other '...'
>>>> arguments to cbind(), but I am presumably missing
>>>> something here.
>>
>>> Yes, I think passing the '...' is important there...
>>> OTOH, I'm almost sure that I wrote the 'FIXME' because I
>>> thought one should be able to do things better. So, I'm
>>> happy to e-talk to you about how to get rid of the FIXME
>>> and still remain back-compatible: Notably with the
>>> paragraph in ?poly |> Details:
>>> |>
>>> |> Although formally ‘degree’ should be named (as it
>>> follows ‘...’), |> an unnamed second argument of length
>>> 1 will be interpreted as the |> degree, such that
>>> ‘poly(x, 3)’ can be used in formulas.
>>
>> As a matter of fact, a patch seems very simple, and I am
>> testing it now.
>>
>> Won't have much more time today, but will return "on this
>> channel" later, maybe tomorrow.
>>
>> Martin
> Martin,
> Thanks for taking the time to look at this!
> Marc
Duncan had in the mean time filed a bug report about this,
--> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17310
but I had fixed the issue even before seeing the PR.
[currently fixed in R-devel only (svn r 72919)]
Martin
More information about the R-devel
mailing list