[R-sig-ME] Mixed model correlation structure for unbalanced, longitudinal data

Wed Jul 4 12:21:54 CEST 2012

------------------------------

Message: 2
Date: Tue, 03 Jul 2012 17:47:51 -0400
From: Andy Flies <andyflies at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Mixed model correlation structure for unbalanced
	longitudinal data
Message-ID: <4FF36887.5050702 at gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

Dear R users,

I have data from a long-term study that has opportunistically collected
samples over the past 10 years. My data set is highly unbalanced because
of the opportunistic sample collection.I have a single sample from 19
individuals, 2 samples from 4 individuals, and 3 samples from 2
individuals.I know that lmer can accommodate unbalanced data sets, but I
am unsure if my data set is too unbalanced.

I am testing if social rank, reproductive status, and age affect my
response variables. I also need to determine if sample collection
parameters such as sample date and the time from anesthetizing the
animal to the time the sample was collected affects the response variables.
Here are what I see as potential options:

1)Use a mixed model with subject as random intercept and sample date as
random slope to account for potential temporal autocorrelation within
the repeat samples.
Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 + date | subject)

2)Use a mixed model with subject as random intercept. Initial data
exploration does not show any obvious temporal autocorrelation.
Lmer( y ~ 1 + x1 + x2 + x3 + ? (1 | subject)

3)Use a GEE and specify an autoregressive correlation structure. I think
this would be a good option, but from what I have found in the
literature, my sample size is too small for this.

4)Use the mean for each individual and use a standard linear model. This
option is not good because it does not allow me to include reproductive
status as a predictor because reproductive status changes between samples.

5)Use only a single sample from each individual in standard linear
model. This option is not good because my already limited sample size
would be further reduced.

Please let me know which of the above options would be best or if you
can suggest a better option. Any advice or literature references are
sincerely appreciated.

Thanks,
Andy

Andy...do I understand it well that you have 33 observations in total? If so...then
I don't want to be the boogie man....but......seriously consider simplifying all these
models. Option 4 with only 1 or 2 covariates would be my choice. Ask yourself whether it makes sense to analyze
these data at all...perhaps making only some simple graphs?

Alain

-- 

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7

2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9

3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3

4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno.
http://www.highstat.com/book4.htm

Other books: http://www.highstat.com/books.htm

Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com