[R-sig-ME] How to use nlmer on a dataset with multiple fixed and random effects

Fri May 11 15:42:43 CEST 2012

Lauren Hooton <lauren.hooton at ...> writes:

> >> I am trying to model the effect of weather variables on bat activity
> >> (passes/hour) over three years and multiple geographic locations.
> >> Specifically, the effects are:
> >>
> >> Fixed = temperature, wind speed, wind direction, pressure,
> >> precipitation, relative humidity
> >> Random = year, week, detector, hour
> >> (Within each year there were multiple detectors recording bat
> >> activity, and these detectors (locations) changed each year).
> >>
> >> I started out using glmer() in lme4, with the following code:
> >> LACI.model.8 <-
> >> glmer(LACI~AvgTemp+AvgSpeed+AvgDirection+Pressure+
> >>   Precip+RH+(1|year)+(1|weeks_July1)+(1|detector)+(1|GMT_hour),
> >> data=allbatwxstd, family=poisson)
> >
> >  A quick question: can you use a quadratic function of one or
> > more of your continuous predictors in your model?  That is nonlinear
> > in terms of the original predictor, but it is still a linear *model*
> > (i.e. it is linear in terms of the parameters of the model).  You can
> > use either (e.g.) Pressure + I(Pressure^2), or (more numerically
> > stable and statistically sounder but possibly harder to interpret)
> > poly(Pressure,2) to add a quadratic term in Pressure ...
> >
> >  (Sorry if this isn't relevant, I'm posting in a hurry)
> >
> Thanks for responding - a few others have also suggested including
> quadratic terms.  However, I'm not sure that quadratic is the best fit
> for the data either....the residual plots are still not great.  My
> uncertainty as to what would be the best method for my data led me to
> want to pursue non-linear methods.  Perhaps I should have phrased my
> question as: How do you know which model is the best? Ie: a glmer with
> or without quadratic terms? A non-linear model?
> Is the best method to look at the fitted values vs the residuals?

   "How do you know which model is the best" is a very good question,
and one that's hard to answer in a single cut-and-dried way.

 * Looking at residuals plots is certainly a good first step.
It's handy for two reasons: (1) since it removes the signal (fitted
values) from the response, it makes it easier to see the deviations;
(2) you can *always* plot residuals vs fitted values -- you don't
need to know anything about the structure of the data set.

* However, I often like overlaying the _predicted_ values on the
original data as well.  This puts deviations of predicted vs
actual in context (you may be worrying about a deviation that is
small in absolute terms), and often makes it easier to see where
in the data set the deviation occurs (because in this case you
have to plot the predicted and observed values against the values
of the predictor variables).

* One thing to remember about GLMMs is that, due to the link
functions, the models *are* nonlinear (in a specific, restricted
sense).  If you're using a Poisson model, then the default is
a log link -- so by default you're fitting an exponential model
to the data.  If you include a quadratic term, then the model
is 'Gaussian' (in shape, i.e. exp(-x^2), not in the distribution
of the residuals).  It is true that it's hard to fit some patterns
this way, e.g. an increasing but saturating curve (although the
left half of a Gaussian looks a bit like this).  

* People do often use quadratic models to try to fit slightly
nonlinear patterns -- it makes sense as the next term in a
polynomial expansion.  If you want a quantitative test, you
can try a likelihood ratio test or calculating the AIC of
a linear vs quadratic model.

* The other way to allow for more complex shapes (technically
not 'nonlinear' but allowing for a very broad class of curves
is to use generalized additive models -- see 'gam' in the mgcv
package.