[R-sig-ME] Testing a trend across possibly non-independent estimates
Steven Orzack
orzack at freshpond.org
Thu Sep 10 17:02:59 CEST 2015
A colleague and I have a longitudinal dataset in which records are
organized as follows
individual_ID_1 age cohort health_at_age
individual_ID_1 age+1 cohort health_at_age+1
individual_ID_2 age cohort health_at_age
individual_ID_2 age+1 cohort health_at_age+1
etc.
call this longitudinal.df
There are a few thousand individuals, twenty or so cohorts, and up to 15
or so health scores
(each at a different age) for a given individual.
Health is scored as present/absent (1/0).
We wish to assess a hypothesis about the trend across ages of how health
at a given age changes
over cohorts.
In particular, one hypothesis is that, say, the slope parameter for a
regression (cohort is
predictor variable, health is the response variable) is negative for
younger ages and is positive
for older ages.
Note that there are two ways one could derive a slope estimate for a
given age.
First way: for a given age defined by a logical variable sel, one can
use a GLM
glm(health ~ 1 + cohort, family = binomial, data = longitudinal.df,
subset = sel)
Repeat this for each age to derive the age-specific estimate to be added
to the ensemble of slope
estimates.
a GLM works because an individual contributes only one health record for
a given age-specific GLM.
Second way: one could use a GLMM
glmer(health ~ 1 + Cohort * as.factor(Age) + (cohort|individual_ID),
data = longitudinal.df,
family = binomial)
The GLMM model generates age-specific intercepts and slopes.
As it happens, this GLMM model does have substantially more support
(AIC) than does the GLMM model
without age as a factor (even though it has many more parameters).
Hence, it appears that trends
over cohorts differs across ages. Of course, this only means that there
are differences among ages.
By itself, it says nothing about how the trends changes over ages.
I welcome opinions as to the merits of the GLM approach and the GLMM
approach. I regard the latter
as more appropriate and statistically proper.
The main question is:
Given an ensemble of slope estimates (derived by GLM or by GLMM) and
that the data underlying any
one slope estimate cannot be assumed to be independent of the data
underlying the other slope
estimates (because most individuals contribute a health record for
multiple ages),
How does one statistically assess whether there is an increasing trend
across ages of how
health changes over cohorts (as measured by the sign and magnitude of
the slope for a given age)?.
If the data underlying any one slope estimate were known to be
independent of the data underlying
any other estimate, the assessment would be straight forward (using a
correlation or regression).
A related question stems from my discomfort about only using the slope
estimate to assess the trend
across cohorts for a given age. Is there a more "synthetic" way to do
this, one that is based upon
the trend determined by all of the regression coefficients (intercept,
slope, etc.)?
note that in this analysis, random effects are nuisance parameters.
Accounting
for them is good practice and important but I doubt that any association
between measures for a
given individual influences the biological bottom line of the analysis.
Nonetheless, I do not want
to whistle past the graveyard.....
Many thanks in advance for suggestions!
--
Steven Orzack
More information about the R-sig-mixed-models
mailing list