[R-sig-ME] mixed models with very few measurements by subject

Fri Nov 8 17:03:33 CET 2019

Dear list,

sorry for bothering.

I was presented this type of data:

Abundance = my response variable, 300 observations per species, no NAs

habitat_type = a fixed effect, a factor with 9 levels.

sample_location = a random effect, a factor with 150  levels. I assume 
there is enough unmeasured variability to warrant this as a random factor

landscape = another random factor of three levels, in which 
sample_location is nested within.

Notice that not every sample_location or landscape contains all levels 
of habitat_type.

Every sample_location was measured twice with an interval of 1 year 
inbetween.  In principle, this can be coded as factor as well, to 
account for temporal variability. Initial analysis showed there is very 
little temporal variability.

But then i am left with only one observation per location, and i was 
reading 
(https://stats.stackexchange.com/questions/242821/how-will-random-effects-with-only-1-observation-affect-a-generalized-linear-mixe) 
that this way

residual errors and random effects may be confounded. Landscape has 50 
observations, but only three groups, which i think is also not a wise 
option, as per 
https://stats.stackexchange.com/questions/37647/what-is-the-minimum-recommended-number-of-groups-for-a-random-effects-factor.

I am interested in Abundance ~ habitat_type and if there are differences 
in abundance means. I first totally ignored the existence of 
sample_location:

mod <- aov(Abundance~habitat_type); res <- glht(mod, 
mcp(habitat_type="Tukey", vcov=vcovHC).

And then i compared this to

  amod <- lme(fixed=Abundance~habitat_type, data = D, random = 
~1|sample_location , method="ML") ;  means <- emmeans(amod, ~habitat_type)

There are very few differences between the two approaches. I also 
ignored landscape at this level.

My Questions:

1. Are sample_location (many subjects, few observations) and landscape 
(few groups, many observations) suitable candidates to be modelled as a 
random effect?

2. Can their nestedness save me, and how would i code 
Landscape:sample_location?

3. Would it better to code the locations as coordinates and check for 
different correlation structures in gls?

Thank you for your kind advice!

-- 
Dr. Tim Richter-Heitmann

University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069