[R-sig-ME] Should I use GLMM?

Ben Bolker bbolker at gmail.com
Wed Apr 30 03:05:14 CEST 2014

Rodrigo Tardin <rhtardin at ...> writes:

> Dear all,
> I am new to spatial analysis and I've been struck with what model to use
> for my PhD analysis. If somebody could help me, I would be very appreciated.
> I am investigating the distribution of Bryde's whales in Rio de Janeiro
> waters and trying to identify the influence of three explanatory variables
> (Depth, Distance to coast and Sea slope). Each day I went to the sea, in
> which I observed a whale, I marked a first GPS location for that whale and
> started to follow it, marking new GPS locations at every 500m of whale's
> movement. I have 22 days of observations, in which each day I have 7-8 GPS
> locations for each whale. I was told that if I use all that GPS locations I
> am using a dataset that is not independent from each other. Can GLMM work
> for me, in which I can insert in my model the ID of each whale and, thus,
> somehow correct or make the model work? If not, can anybody point me a
> direction?

   It's not entirely clear to me what question you're specifically
trying to address with "influence" -- are you trying essentially to
look for habitat preference/selection by whales?  This is itself a
bit of a difficult problem, because your data set contains only
the explanatory variables from the places where you found the whales,
not from places where you didn't find them.  A standard approach to
this (although not without its problems) is to make up a series
of "pseudo-absences" by picking points at random from the plausible
region where you might have found whales, and doing a logistic
regression with the presences and pseudo-absences.  In that case
you should probably pick a set of pseudo-absences for each individual,
then do

  glmer(pres ~ depth+dist+slope+ (depth+dist+slope|ID), 
     family=binomial, data= ...)

* How many whales?
* You may need to worry about temporal/spatial autocorrelation as
well -- also not trivial, aggregating data (at the cost of losing
data) is the simplest solution.  Over what temporal/spatial scales
are these variables similar to each other?

  You should really get some more focused statistical help if at
all possible -- this is not a trivial problem ...

  Ben Bolker

More information about the R-sig-mixed-models mailing list