[R-sig-ME] Testing a trend across possibly non-independent estimates
Thierry Onkelinx
thierry.onkelinx at inbo.be
Fri Sep 11 11:02:43 CEST 2015
Dear Steven,
I assume that each individual remains in the same cohort. Then a random
slope of cohort is pointless.
Are cohort age groups?
I'd first try to write down the equation of the model you have in mind. I
have the feeling that you're stuck at that point. Consulting a local
statistician might be a good idea.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-09-10 17:02 GMT+02:00 Steven Orzack <orzack op freshpond.org>:
> A colleague and I have a longitudinal dataset in which records are
> organized as follows
>
> individual_ID_1 age cohort health_at_age
> individual_ID_1 age+1 cohort health_at_age+1
> individual_ID_2 age cohort health_at_age
> individual_ID_2 age+1 cohort health_at_age+1
>
> etc.
>
> call this longitudinal.df
>
> There are a few thousand individuals, twenty or so cohorts, and up to 15
> or so health scores
> (each at a different age) for a given individual.
>
> Health is scored as present/absent (1/0).
>
> We wish to assess a hypothesis about the trend across ages of how health
> at a given age changes
> over cohorts.
>
> In particular, one hypothesis is that, say, the slope parameter for a
> regression (cohort is
> predictor variable, health is the response variable) is negative for
> younger ages and is positive
> for older ages.
>
> Note that there are two ways one could derive a slope estimate for a given
> age.
>
> First way: for a given age defined by a logical variable sel, one can use
> a GLM
>
> glm(health ~ 1 + cohort, family = binomial, data = longitudinal.df, subset
> = sel)
>
> Repeat this for each age to derive the age-specific estimate to be added
> to the ensemble of slope
> estimates.
>
> a GLM works because an individual contributes only one health record for a
> given age-specific GLM.
>
> Second way: one could use a GLMM
>
> glmer(health ~ 1 + Cohort * as.factor(Age) + (cohort|individual_ID), data
> = longitudinal.df,
> family = binomial)
>
> The GLMM model generates age-specific intercepts and slopes.
>
> As it happens, this GLMM model does have substantially more support (AIC)
> than does the GLMM model
> without age as a factor (even though it has many more parameters). Hence,
> it appears that trends
> over cohorts differs across ages. Of course, this only means that there
> are differences among ages.
> By itself, it says nothing about how the trends changes over ages.
>
> I welcome opinions as to the merits of the GLM approach and the GLMM
> approach. I regard the latter
> as more appropriate and statistically proper.
>
> The main question is:
>
> Given an ensemble of slope estimates (derived by GLM or by GLMM) and that
> the data underlying any
> one slope estimate cannot be assumed to be independent of the data
> underlying the other slope
> estimates (because most individuals contribute a health record for
> multiple ages),
>
> How does one statistically assess whether there is an increasing trend
> across ages of how
> health changes over cohorts (as measured by the sign and magnitude of the
> slope for a given age)?.
>
> If the data underlying any one slope estimate were known to be independent
> of the data underlying
> any other estimate, the assessment would be straight forward (using a
> correlation or regression).
>
> A related question stems from my discomfort about only using the slope
> estimate to assess the trend
> across cohorts for a given age. Is there a more "synthetic" way to do
> this, one that is based upon
> the trend determined by all of the regression coefficients (intercept,
> slope, etc.)?
>
> note that in this analysis, random effects are nuisance parameters.
> Accounting
> for them is good practice and important but I doubt that any association
> between measures for a
> given individual influences the biological bottom line of the analysis.
> Nonetheless, I do not want
> to whistle past the graveyard.....
>
> Many thanks in advance for suggestions!
>
> --
> Steven Orzack
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list