[R-sig-ME] model advice

Mon Aug 1 04:06:08 CEST 2016

On 16-07-27 10:03 PM, Guy,Travis J wrote: > Hello!  >

> I'm a master's student studying pollination networks. I have been
furiously trying to learn about linear mixed models and glmms, but I
have some specific questions relating to my project analysis that I am
hoping someone can help me with

> Here's the short of my project. I can provide more details if need
  be. I am looking at 16 pollination metrics (ex. specialization). A
  few of the metrics are count data (ex. floral abundance) and several
  are proportions (limited to be between 0 and 1). I am interested in
  how rainfall (high and low location) and wildlife exclusion
  (treatment) affect the pollination metrics. I have constructed 12
  networks in total. 6 networks in the low rainfall area having 3
  networks with wildlife excluded and 3 networks allowing
  wildlife. Then there are 6 networks in the high rainfall area again
  have 3 networks with wildlife excluded and 3 with wildlife
  included. So sample size is obviously small. It's a block design
  with 3 blocks in the low rainfall and 3 blocks in the south
  location. Each block has the wildlife excluded treatment and the
  wildlife allowed treatment.

> Here are my questions:

> The majority of my metrics fit model assumptions (normality of
  residuals, variance within groups, normality within groups,
  normality of random effects, and linearity/absence of
  heteroskedasticity). However I have some where normality appear to
  be violated and the fitted vs residuals plot is no good. Various
  transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help.
  From reading papers by Dr. Ben Bolker, this is where it appears
  GLMMs come in.

> So for the metrics that fit model assumptions my plan is to fit this model

  metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks),
    data = UHURUnets)

>  but for those where model assumptions aren't met, I'm not sure how
> one picks which exponential family to use and which link to use. How
> does one go about deciding what family and link to use?

   This is really a general question about generalized linear models,
not about GLMMs - there are a fair number of questions (& answers)
on CrossValidated:

http://stats.stackexchange.com/search?q=glm+which+distribution

> I read in Dr. Bolker's TREE paper that binomial distribution and
  logit link are best for proportions. Is this generally the case?

   That depends.  If you know the denominator (i.e. maximum possible
number, also referred to as N), which would seem to be the case (in
your case it would be the total number of species available for
pollination, I guess?), then binomial/logit makes sense.

  If your response is weighted (as suggested by
the variable name) it might get a little tricky, but as long
as it seems sensible to set a maximum number on the possible
responses it should be OK (although you will get warnings).
You do need to include the N, in this case probably via a 'weights'
argument

> NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks),
  data = UHURUnets, family = binomial(link = "logit")?

> For the count data (ex. floral abundance, insect abundance), it
  seems like I should use Poisson and log link according to that same
  paper paper.

> No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment +
  Location + (1|Blocks), data = UHURUnets, family = poisson(link =
  "log")?

seems reasonable, although you should make sure to account for
overdispersion

> But what distribution and link would one use for continuous data
  that is not in proportions?

Generally your best hope for continuous data is a transformation.
You can use a Gamma for data that are positive, but log-transformation
followed by a linear mixed model is often reasonable too.  We would
probably need more information.

> And once you have made a GLMM model, I am assuming it is okay that
  this model still does not fit the normality assumptions or the
  residual vs fitted plots. Is this true?

well, the residuals should still **approximately** fit these
assumptions (worst for binary data)

> My models (both glmms and lmer) currently only have random
  intercepts. I have read that it might be wise to also have random
  slopes as well because the pollination metric could vary for each
  treatment and location depending on which block it is in.

Yes, although it can be hard to get enough data to make this
worthwhile.

> So then I believe I would have a model like this
> ?
> Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 +
Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link =
log), data = UHURUnets)

> I am not sure if this is correct. I get 2 warnings (failed to converge
and unable to evaluated scaled gradient). Interestingly I appear to not
get these warnings if I am running linear mixed models (lmer). Am I
doing this correctly?

  Probably. There are lots of false positives.  See ?convergence

> Lastly, is it appropriate to use interaction terms in GLMMs and lmers?
I imagine that the rainfall level my interact with the treatment to
influence the pollination metric.

>  metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data
= UHURUnets, family = gaussian(link = log)??)
>

  Definitely OK.