[Rd] Why does poly work for unordered factors?
Roland Fuß
ro|@nd@|u@@ @end|ng |rom thuenen@de
Mon Oct 27 06:57:19 CET 2025
Great. Thank you, Martin.
Please credit SO users Christoph and SamR instead of me. I was just
brave enough to send an email to the list. Yes, spam is one of the
reasons we can't have nice things but it's unfortunate that there isn't
a low-barrier way for bug reporting. Sending your first email to this
list is scary.
Roland
Am 25.10.2025 um 12:55 schrieb Martin Maechler:
>>>>>> Deepayan Sarkar
>>>>>> on Wed, 22 Oct 2025 16:43:49 +0530 writes:
> > On Wed, 22 Oct 2025 at 14:41, Martin Maechler
> > <maechler using stat.math.ethz.ch> wrote:
> >>
> >> >>>>> Roland Fuß via R-devel
> >> >>>>> on Wed, 22 Oct 2025 10:24:07 +0200 writes:
> >>
> >> > This doesn't seem intended.
> >>
> >> You are right. The code change, reverting to previous behaviour
> >> notably for "Date",
> >> was prompted on this R-devel list,
> >> https://stat.ethz.ch/pipermail/r-devel/2022-July/081850.html
> >>
> >> But that the change allows poly(<factor>, .) to work was overlooked (by
> >> me and anyone else ..) and is a bug we will change.
> >>
> >> > See:
> >>
> >> > https://stackoverflow.com/questions/79795583/why-does-poly-work-for-unordered-factors-it-previously-did-not-work
> >>
> >> As was already raised in the above SO thread,
> >> what should happen for *ordered* factors is less obvious.
> >> A warning was proposed, but I thought that this was too harsh;
> >> hence, we could use message(), or just keep allowing it.
> >>
> >> Opinions?
>
>
> > Given that we use contr.poly by default for ordered factors, I think
> > it's very natural to allow it (without even a message). In fact, it
> > would be a nice way to illustrate what contr.poly does; e.g.,
>
> >> y <- rnorm(100); g <- gl(5, 20, ordered = TRUE)
> >> summary(lm(y ~ g)) |> coefficients()
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) 0.138970785 0.1020089 1.36233970 0.1763120
> > g.L -0.182590696 0.2280989 -0.80048932 0.4254247
> > g.Q -0.206493256 0.2280989 -0.90527968 0.3676074
> > g.C 0.003626904 0.2280989 0.01590058 0.9873471
> > g^4 -0.074807753 0.2280989 -0.32796199 0.7436621
> >> summary(lm(y ~ poly(g, 4))) |> coefficients()
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) 0.13897078 0.1020089 1.36233970 0.1763120
> > poly(g, 4)1 -0.81657042 1.0200891 -0.80048932 0.4254247
> > poly(g, 4)2 -0.92346592 1.0200891 -0.90527968 0.3676074
> > poly(g, 4)3 0.01622001 1.0200891 0.01590058 0.9873471
> > poly(g, 4)4 -0.33455044 1.0200891 -0.32796199 0.7436621
>
> > Best,
> > -Deepayan
>
> >>
> >> Martin
> >>
> >> --
> >> Martin Maechler
> >> ETH Zurich and R Core team
> >>
> >> > --
> >> > Dr. Roland Fuß
> >>
> >> > Thünen-Institut für Agrarklimaschutz/
> >> > Thünen Institute of Climate-Smart Agriculture
> >>
> >> > Bundesallee 65
> >> > D-38116 Braunschweig, Germany
>
> I have committed a straightforward small change to R-devel(only)
> such that poly(f, n) now will _again_ (as in R <= 4.1.0)
> signal an error if `f` is factor but not an _ordered_ factor:
>
> ------------------------------------------------------------------------
> r88970 | maechler | 2025-10-25 12:48:37 +0200 (Sa, 25. Okt 2025) |
>
> poly(<factor>, ..) should error (unless it is.ordered(.))
> ------------------------------------------------------------------------
>
> Martin
More information about the R-devel
mailing list