[R-sig-ME] Mixed model correlation structure for unbalanced, longitudinal data

Andy Flies andyflies at gmail.com
Thu Jul 5 18:22:40 CEST 2012

> Message: 2
> Date: Tue, 03 Jul 2012 17:47:51 -0400
> From: Andy Flies <andyflies at gmail.com  <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
> To:r-sig-mixed-models at r-project.org  <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> Subject: [R-sig-ME] Mixed model correlation structure for unbalanced
> 	longitudinal data
> Message-ID: <4FF36887.5050702 at gmail.com  <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> Dear R users,
> I have data from a long-term study that has opportunistically collected
> samples over the past 10 years. My data set is highly unbalanced because
> of the opportunistic sample collection.I have a single sample from 19
> individuals, 2 samples from 4 individuals, and 3 samples from 2
> individuals.I know that lmer can accommodate unbalanced data sets, but I
> am unsure if my data set is too unbalanced.
> I am testing if social rank, reproductive status, and age affect my
> response variables. I also need to determine if sample collection
> parameters such as sample date and the time from anesthetizing the
> animal to the time the sample was collected affects the response variables.
> Here are what I see as potential options:
> 1)Use a mixed model with subject as random intercept and sample date as
> random slope to account for potential temporal autocorrelation within
> the repeat samples.
> Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 + date | subject)
> 2)Use a mixed model with subject as random intercept. Initial data
> exploration does not show any obvious temporal autocorrelation.
> Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 | subject)
> 3)Use a GEE and specify an autoregressive correlation structure. I think
> this would be a good option, but from what I have found in the
> literature, my sample size is too small for this.
> 4)Use the mean for each individual and use a standard linear model. This
> option is not good because it does not allow me to include reproductive
> status as a predictor because reproductive status changes between samples.
> 5)Use only a single sample from each individual in standard linear
> model. This option is not good because my already limited sample size
> would be further reduced.
> Please let me know which of the above options would be best or if you
> can suggest a better option. Any advice or literature references are
> sincerely appreciated.
> Thanks,
> Andy
> Andy...do I understand it well that you have 33 observations in total? If so...then
> I don't want to be the boogie man....but......seriously consider simplifying all these
> models. Option 4 with only 1 or 2 covariates would be my choice. Ask yourself whether it makes sense to analyze
> these data at all...perhaps making only some simple graphs?
> Alain
> -- 
> Dr. Alain F. Zuur
> First author of:
> 1. Analysing Ecological Data (2007).
> Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
> URL:www.springer.com/0-387-45967-7
> 2. Mixed effects models and extensions in ecology with R. (2009).
> Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
> http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9
> 3. A Beginner's Guide to R (2009).
> Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
> http://www.springer.com/statistics/computational/book/978-0-387-93836-3
> 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno.
> http://www.highstat.com/book4.htm
> Other books:http://www.highstat.com/books.htm
> Statistical consultancy, courses, data analysis and software
> Highland Statistics Ltd.
> 6 Laverock road
> UK - AB41 6FN Newburgh
> Tel: 0044 1358 788177
> Email:highstat at highstat.com  <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> URL:www.highstat.com
> URL:www.brodgar.com

Hello Alain,

Thank you for responding to my question.

My full data set has 96 observations from 72 individuals. The subset of 
data I referred to in my initial question was a subset that included 
observations on adult females that have no missing values in any of the 
covariates. I am primarily interested in the following question: Does 
social rank affect my response variables? Social rank data are available 
for all observations, whereas some of the covariates are missing values, 
so the subset sample size increases if I drop covariates from the 
analysis. Some of the covariates, such as age and sample collection 
date, are correlated for the repeated measures, so potential 
collinearity issues might justify dropping one of the variables from the 
analysis anyway. Additionally, simple graphs of the response variables 
vs. the covariates do not suggest two-way relationships in most cases. 
Based on the techniques outlined in your data exploration methods paper, 
I agree that it would be better to keep only 1 or 2 covariates in the 
analysis and use the collapsed mean of repeated measures, rather than 
using a mixed model approach.

I initially planned to include the sample collection date as a covariate 
in the subset models to assess whether or not the sample collection date 
is important in any of the data subsets. However, there is no biological 
reason to expect the effect of sample collection date would be different 
in any of the data subsets (i.e. males, females, juveniles, adults). I 
think it would be more appropriate to assess the effect of sample 
collection date in the full data set and then not include it as a 
covariate in the subset analysis.

Thanks you,


More information about the R-sig-mixed-models mailing list