[R-sig-ME] Testing a trend across possibly non-independent estimates

Thu Sep 10 17:02:59 CEST 2015

A colleague and I have a longitudinal dataset in which records are 
organized as follows

individual_ID_1        age        cohort    health_at_age
individual_ID_1        age+1    cohort    health_at_age+1
individual_ID_2        age        cohort    health_at_age
individual_ID_2        age+1    cohort    health_at_age+1

etc.

call this longitudinal.df

There are a few thousand individuals, twenty or so cohorts, and up to 15 
or so health scores
(each at a different age) for a given individual.

Health is scored as present/absent (1/0).

We wish to assess a hypothesis about the trend across ages of how health 
at a given age changes
over cohorts.

In particular, one hypothesis is that, say, the slope parameter for a 
regression (cohort is
predictor variable, health is the response variable) is negative for 
younger ages and is positive
for older ages.

Note that there are two ways one could derive a slope estimate for a 
given age.

First way: for a given age defined by a logical variable sel, one can 
use a GLM

glm(health ~ 1 + cohort, family = binomial, data = longitudinal.df, 
subset = sel)

Repeat this for each age to derive the age-specific estimate to be added 
to the ensemble of slope
estimates.

a GLM works because an individual contributes only one health record for 
a given age-specific GLM.

Second way: one could use a GLMM

glmer(health ~ 1 + Cohort * as.factor(Age) + (cohort|individual_ID), 
data = longitudinal.df,
family = binomial)

The GLMM model generates age-specific intercepts and slopes.

As it happens, this GLMM model does have substantially more support 
(AIC) than does the GLMM model
without age as a factor (even though it has many more parameters). 
Hence, it appears that trends
over cohorts differs across ages. Of course, this only means that there 
are differences among ages.
By itself, it says nothing about how the trends changes over ages.

I welcome opinions as to the merits of the GLM approach and the GLMM 
approach. I regard the latter
as more appropriate and statistically proper.

The main question is:

Given an ensemble of slope estimates (derived by GLM or by GLMM) and 
that the data underlying any
one slope estimate cannot be assumed to be independent of the data 
underlying the other slope
estimates (because most individuals contribute a health record for 
multiple ages),

How does one statistically assess whether there is an increasing trend 
across ages of how
health changes over cohorts (as measured by the sign and magnitude of 
the slope for a given age)?.

If the data underlying any one slope estimate were known to be 
independent of the data underlying
any other estimate, the assessment would be straight forward (using a 
correlation or regression).

A related question stems from my discomfort about only using the slope 
estimate to assess the trend
across cohorts for a given age. Is there a more "synthetic" way to do 
this, one that is based upon
the trend determined by all of the regression coefficients (intercept, 
slope, etc.)?

note that in this analysis, random effects are nuisance parameters. 
Accounting
for them is good practice and important but I doubt that any association 
between measures for a
given individual influences the biological bottom line of the analysis. 
Nonetheless, I do not want
to whistle past the graveyard.....

Many thanks in advance for suggestions!

-- 
Steven Orzack