[R] analysis of covariance and constrained parameters
Steven Orzack
orzack at freshpond.org
Fri Feb 16 22:14:51 CET 2018
Consider an analysis of covariance involving age and cohort. The goal is
to assess whether the influence of cohort
depends upon the age. The simplest case involves data as follows
value Age Cohort
x1 1 3
x2 1 4
x3 1 5
x4 2 3
x5 2 4
x6 2 5
etc.
Age is a factor. The numeric response variable is value and Cohort is a
numeric predictor. So, (pseudo-code) commands to
estimate the age=specific relationship between value and Cohort could be
glm(value ~ Age/Cohort - 1, family =......, data = .....)
glm(value ~ Age/(Cohort + I(Cohort^2)) - 1, family =......, data = .....).
The latter commands would provide estimates of the age-specific
intercept, linear, and quadratic coefficients, as in
value_Age1 <- intercept_Age1 + linear_Age1*Cohort + quad_Age1*Cohort^2
value_Age2 <- intercept_Age2 + linear_Age2*Cohort + quad_Age2*Cohort^2
This is standard. One would choose among the above models via analysis
of variance or AIC.
Now assume that I have external knowledge that tells me that there is NO
influence of Cohort on value for Age1 and that
there could be up to a quadratic influence for Age2. Accordingly, I
would like to
fit a model which estimates these relationships:
value_Age1 <- intercept_Age1 (+ 0*Cohort + 0*Cohort^2)
(which is, of course, value_Age1 <-
intercept_Age1)
value_Age2 <- intercept_Age2 + linear_Age2*Cohort + quad_Age2*Cohort^2
What is the glm syntax to fit this model? It is a model in which we have
constraints that (two) coefficients for one level of the factor must
have a particular value (0) and
there is no such constraint for the second level of the factor.
Please note that I understand that
glm(value ~ Age/(Cohort + I(Cohort^2)) - 1, family =......, data = .....).
generates point estimates of the linear and quadratic coefficients for
Age1 (as above) and one could inspect them to determine whether they are
statistically equivalent to 0.
However, I want to incorporate the knowledge that these coefficients
MUST BE 0 into my hypothesis testing. Knowing that these coefficients
are 0 could influence the results of
anova and AIC comparisons since it reduces the number of degrees of
freedom associated with model.
Many thanks for suggestions in advance!
--
Steven Orzack
Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA 02140
617 864-4307
www.freshpond.org
More information about the R-help
mailing list