[R] Parameterization puzzle

Prof Brian D Ripley ripley at stats.ox.ac.uk
Fri Jul 21 10:40:07 CEST 2006

```On Fri, 21 Jul 2006, Berwin A Turlach wrote:

> G'day all,
>
>>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>
>    BDR> R does not know that poly(age,2) and poly(age,1) are linearly
>    BDR> dependent.
> Indeed, I also thought that this is the reason of the problem.
>
>    BDR> (And indeed they only are for some functions 'poly'.)
> 'poly' once more and more carefully.

My point was perhaps simpler: if you or I or Murray had a function poly()
in our workspace, that one would be found, I think.  (I have not checked
the ramifications of namespaces here, but I believe that would be the
intention, that formulae are evaluated in their defining environment.)  So
omly when the model matrix is set up could the linear dependence be known
(and there is nothing in the system poly() to tell model.matrix).

What is sometimes called extrinsic aliasing is left to the fitting
function, which seems to be to do a sensible job provided the main effect
is in the model.  Indeed, including interactions without main effects (as
Murray did) often makes the results hard to interpret.

>    BDR> I cannot reproduce your example ('l' is missing), [...]
> My guess is that 'l' is 'pyears'.  At least, I worked under that
> assumption.

Apparently l = log(pyears), which was my uncertain guess.

> Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot
> fit any of the Poisson GLM that Murray tried.  I always get the error
> message:
>
> Error: no valid set of coefficients has been found: please supply starting values

Related to the offset, I believe.

>
> But I have to investigate this further.  I can fit binomial models
> that give me similar answers.
>
>    BDR> [...] but perhaps
>    BDR> glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l),
>    BDR> poisson)
> In this parameterisation a 'poly(age,1)' term will appear among the
> coefficients with an estimated value of NA since it is aliased with
> 'poly(age, 2)1'.  So I don't believe that this was Murray's intention.
>
> The only suggestion I can come up with is:
>
>> summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2), family=binomial))
>
> [...]
>
> Coefficients:
>              Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -10.79895    0.45149 -23.918  < 2e-16 ***
> age            2.37892    0.20877  11.395  < 2e-16 ***
> SmokeYes       1.44573    0.37347   3.871 0.000108 ***
> I(age^2)      -0.19706    0.02749  -7.168  7.6e-13 ***
> age:SmokeYes  -0.30850    0.09756  -3.162 0.001566 **
>
> [...]
>
> Which doesn't use orthogonal polynomials anymore.  But I don't see how
> you can fit the model that Murray want to fit using orthogonal
> polynomials given the way R's model language operates.
>
> So I guess the Poisson GLM that Murray wants to fit is:
>
>        glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson)
>
> Cheers,
>
>        Berwin
>
> Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
> School of Mathematics and Statistics        +61 (8) 6488 3383 (self)
> The University of Western Australia   FAX : +61 (8) 6488 1028
> 35 Stirling Highway
> Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
> Australia                        http://www.maths.uwa.edu.au/~berwin
>
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

```