[R] Parameterization puzzle
Murray Jorgensen
maj at waikato.ac.nz
Sat Jul 22 00:57:56 CEST 2006
Thanks to Brian and Berwin with there help. I faced a double problem in
that I not only wanted to fit the model but I also wanted to do so in
such a way that it would seem natural for a classroom example.
The moral seems to be that I should do the orthogonal polynomial stuff
outside the model formula. Here then is my solution:
pyears <- scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317
deaths <- scan()
2 32 12 104 28 206 28 186 31 102
l <- log(pyears)
Smoke <- gl(2,1,10,labels=c("No","Yes"))
Age <- gl(5,2,10,labels=c("35-44","45-54","55-64","65-74","75-84"),
ordered=TRUE)
mod1.glm <- glm(deaths ~ Age * Smoke + offset(l),family=poisson)
summary(mod1.glm)
age <- as.numeric(Age)
age1 <- poly(age,2)[,1]
age2 <- poly(age,2)[,2]
mod2.glm <- aso1.glm <- glm(deaths ~ age1 + age2 + Smoke +
age1:Smoke + offset(l),family=poisson)
summary(mod2.glm)
The final summary then comes out looking like this:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.8368 0.1213 -48.103 < 2e-16 ***
age1 5.3238 0.4129 12.893 < 2e-16 ***
age2 -1.0460 0.1448 -7.223 5.08e-13 ***
SmokeYes 0.5183 0.1262 4.106 4.02e-05 ***
age1:SmokeYes -1.3755 0.4340 -3.169 0.00153 **
which is just what I wanted.
Cheers, Murray Jorgensen
Prof Brian D Ripley wrote:
> On Fri, 21 Jul 2006, Berwin A Turlach wrote:
>
>> G'day all,
>>
>>>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>>
>> BDR> R does not know that poly(age,2) and poly(age,1) are linearly
>> BDR> dependent.
>> Indeed, I also thought that this is the reason of the problem.
>>
>> BDR> (And indeed they only are for some functions 'poly'.)
>> I am surprised about this. Should probably read the help page of
>> 'poly' once more and more carefully.
>
> My point was perhaps simpler: if you or I or Murray had a function
> poly() in our workspace, that one would be found, I think. (I have not
> checked the ramifications of namespaces here, but I believe that would
> be the intention, that formulae are evaluated in their defining
> environment.) So omly when the model matrix is set up could the linear
> dependence be known (and there is nothing in the system poly() to tell
> model.matrix).
>
> What is sometimes called extrinsic aliasing is left to the fitting
> function, which seems to be to do a sensible job provided the main
> effect is in the model. Indeed, including interactions without main
> effects (as Murray did) often makes the results hard to interpret.
>
>> BDR> I cannot reproduce your example ('l' is missing), [...]
>> My guess is that 'l' is 'pyears'. At least, I worked under that
>> assumption.
>
> Apparently l = log(pyears), which was my uncertain guess.
>
>> Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot
>> fit any of the Poisson GLM that Murray tried. I always get the error
>> message:
>>
>> Error: no valid set of coefficients has been found: please supply
>> starting values
>
> Related to the offset, I believe.
>
>>
>> But I have to investigate this further. I can fit binomial models
>> that give me similar answers.
>>
>> BDR> [...] but perhaps
>> BDR> glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l),
>> BDR> poisson)
>> BDR> was your intention?
>> In this parameterisation a 'poly(age,1)' term will appear among the
>> coefficients with an estimated value of NA since it is aliased with
>> 'poly(age, 2)1'. So I don't believe that this was Murray's intention.
>>
>> The only suggestion I can come up with is:
>>
>>> summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2),
>>> family=binomial))
>>
>> [...]
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -10.79895 0.45149 -23.918 < 2e-16 ***
>> age 2.37892 0.20877 11.395 < 2e-16 ***
>> SmokeYes 1.44573 0.37347 3.871 0.000108 ***
>> I(age^2) -0.19706 0.02749 -7.168 7.6e-13 ***
>> age:SmokeYes -0.30850 0.09756 -3.162 0.001566 **
>>
>> [...]
>>
>> Which doesn't use orthogonal polynomials anymore. But I don't see how
>> you can fit the model that Murray want to fit using orthogonal
>> polynomials given the way R's model language operates.
>>
>> So I guess the Poisson GLM that Murray wants to fit is:
>>
>> glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson)
>>
>> Cheers,
>>
>> Berwin
>>
>> ========================== Full address ============================
>> Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
>> School of Mathematics and Statistics +61 (8) 6488 3383 (self)
>> The University of Western Australia FAX : +61 (8) 6488 1028
>> 35 Stirling Highway
>> Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au
>> Australia http://www.maths.uwa.edu.au/~berwin
>>
>>
>
--
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862
More information about the R-help
mailing list