[R-sig-ME] help

Ben Bolker bbolker at gmail.com
Wed Jan 8 04:11:23 CET 2014


 <jersa at ...> writes:

> 
> Dear glmer experts,
> 
> I would be very happy if someone could help with following problem.	
 
> I have following data: I planted seed bags in the vicinity of 10
> mother plants into three directions and 4 distances (10,31,56,
> 100cm). The germination success (0/1) was asseed by extracting one
> seed bag per microsite for next three years. I am interested into
> the effect of distance on germination success and possible
> differencies in germination between years. The data have lots of 0
> values.

> I used originaly following syntax 
> glmer(germination~distance+(1|plant/direction/year),
>    family=binomial,data=seed)

  Logically, direction and distance seem more like fixed effects to me
(see http://glmm.wikidot.com/faq#fixed_vs_random for more discussion),
but this leads to some serious overparameterization problems, so
you may actually be better off treating them as grouping factors
as you are here.

  Since you have a randomized-block design (all levels of fixed
effects are replicated within every block), you could *in principle*
fit a full model:

glmer(germination~distance*direction*year+
    (distance*direction*year|plant), ...)

that accounts for the variation in all effects among plants, but
it certainly won't be practical -- there are 36 combinations
of year/direction/distance, and the random effect here would try
to estimate all of the correlations among them, so you'd have
36 fixed-effect parameters and (36*37/2) random effect parameters --
somewhat crazy.  You have a total of 360 observations, but if you
have "lots of zeros" then the effective sample size is more
appropriately considered as the number of successful germinations
(see Harrell _Regression Modeling Strategies_).  If we suppose you
have 10% germination overall, you shouldn't be trying to fit more
than three or four (approx. N/10) parameters to this data set, so you're
going to be having some difficulty ...

  Even

  germination~(distance+direction+year)^2+ (1|plant)

which fits all the two-way interactions between distance/direction/year
is way too complex ...

  The logical problem with your

 plant/direction/year 

specification is that it assumes that the
effects of direction can only vary within plants, not consistently
across plants (maybe reasonable if your directions differ for each
plant and are not e.g. North/South/West), and worse that the
effect of year can only vary within plant and direction and not
overall.  It's tempting to use (1|plant/direction)+(1|year) , but
then you'll be in trouble because it's hard to estimate a variance
from three points (you'll probably end up concluding, wrongly, that
there's zero variance across years).  Logically you could add year
as a fixed effect, but that then costs another two parameters, which
you can hardly afford to spend ...

  To get back to your original question about interactions -- unless
they're very large, I think you're going to have a hard time detecting
them in any case with this size data set.  What I might do is use something
close to your original model, or perhaps

  ~distance+year+(1|plant/direction)

or

   ~distance+year+(distance|plant)

(since distance is your variable of primary interest, you really
should be trying to allow for among-plant variation in it -- see
Schielzeth and Forstmeier 2009) -- this is not going to be practical
unless you treat distance as a continuous variable though.

  Bottom line: I would try to do something fairly simple and sensible,
*LOOK AT YOUR DATA* to try to see what the main patterns are, and
hope that large interactions etc. will emerge in the model diagnostic
plots if they're there.

  good luck
   Ben Bolker



More information about the R-sig-mixed-models mailing list