[R-sig-ME] A modeling question for the ecologists

Fri Sep 21 03:26:37 CEST 2012

This message is directly especially toward the ecologists like Ben Bolker, but I'd welcome any advice . . .

We have data on the harvests of wildlife.  For approximately 5,000 harvested animals (and thus 5,000 rows in the data frame), we have these variables:
1. "Species" (about 20 in all)
2. "Activity" (whether the harvested species is predominantly diurnal or nocturnal)
3. "Time" (the time of day at which the animal was harvested
4. "Dogs" (whether or not dogs were present and presumably assisting when the animal was captured)

The working hypothesis is that dogs increase harvests of nocturnal species because they can sniff them out during the day when they're sleeping and track them at night when vision is limited (almost all nocturnal hunting involves dogs in this setting).

So initially I was inclined to specify a model:

model <- glmer ( Dogs ~ Activity * Time + (1|Species) , family = binomial, data = d)

That was partly because "Activity" is essentially a species-level variable, so it felt appropriate to include it as a predictor with a random effect for "Species."

Intuitively, though, we tend to think of "Activity" as the outcome variable and the presence of dogs as a predictor, which raises or lowers the preponderance of nocturnal species in the harvest.

Is there a good way to model these data while retaining that sense of causality -- in other words, could we put Activity on the left side of the equation?