[R-sig-ME] model advice
Ben Bolker
bbolker at gmail.com
Mon Aug 1 04:06:08 CEST 2016
On 16-07-27 10:03 PM, Guy,Travis J wrote: > Hello! >
> I'm a master's student studying pollination networks. I have been
furiously trying to learn about linear mixed models and glmms, but I
have some specific questions relating to my project analysis that I am
hoping someone can help me with
> Here's the short of my project. I can provide more details if need
be. I am looking at 16 pollination metrics (ex. specialization). A
few of the metrics are count data (ex. floral abundance) and several
are proportions (limited to be between 0 and 1). I am interested in
how rainfall (high and low location) and wildlife exclusion
(treatment) affect the pollination metrics. I have constructed 12
networks in total. 6 networks in the low rainfall area having 3
networks with wildlife excluded and 3 networks allowing
wildlife. Then there are 6 networks in the high rainfall area again
have 3 networks with wildlife excluded and 3 with wildlife
included. So sample size is obviously small. It's a block design
with 3 blocks in the low rainfall and 3 blocks in the south
location. Each block has the wildlife excluded treatment and the
wildlife allowed treatment.
> Here are my questions:
> The majority of my metrics fit model assumptions (normality of
residuals, variance within groups, normality within groups,
normality of random effects, and linearity/absence of
heteroskedasticity). However I have some where normality appear to
be violated and the fitted vs residuals plot is no good. Various
transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help.
From reading papers by Dr. Ben Bolker, this is where it appears
GLMMs come in.
> So for the metrics that fit model assumptions my plan is to fit this model
metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks),
data = UHURUnets)
> but for those where model assumptions aren't met, I'm not sure how
> one picks which exponential family to use and which link to use. How
> does one go about deciding what family and link to use?
This is really a general question about generalized linear models,
not about GLMMs - there are a fair number of questions (& answers)
on CrossValidated:
http://stats.stackexchange.com/search?q=glm+which+distribution
> I read in Dr. Bolker's TREE paper that binomial distribution and
logit link are best for proportions. Is this generally the case?
That depends. If you know the denominator (i.e. maximum possible
number, also referred to as N), which would seem to be the case (in
your case it would be the total number of species available for
pollination, I guess?), then binomial/logit makes sense.
If your response is weighted (as suggested by
the variable name) it might get a little tricky, but as long
as it seems sensible to set a maximum number on the possible
responses it should be OK (although you will get warnings).
You do need to include the N, in this case probably via a 'weights'
argument
> NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks),
data = UHURUnets, family = binomial(link = "logit")?
> For the count data (ex. floral abundance, insect abundance), it
seems like I should use Poisson and log link according to that same
paper paper.
> No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment +
Location + (1|Blocks), data = UHURUnets, family = poisson(link =
"log")?
seems reasonable, although you should make sure to account for
overdispersion
> But what distribution and link would one use for continuous data
that is not in proportions?
Generally your best hope for continuous data is a transformation.
You can use a Gamma for data that are positive, but log-transformation
followed by a linear mixed model is often reasonable too. We would
probably need more information.
> And once you have made a GLMM model, I am assuming it is okay that
this model still does not fit the normality assumptions or the
residual vs fitted plots. Is this true?
well, the residuals should still **approximately** fit these
assumptions (worst for binary data)
> My models (both glmms and lmer) currently only have random
intercepts. I have read that it might be wise to also have random
slopes as well because the pollination metric could vary for each
treatment and location depending on which block it is in.
Yes, although it can be hard to get enough data to make this
worthwhile.
> So then I believe I would have a model like this
> ?
> Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 +
Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link =
log), data = UHURUnets)
> I am not sure if this is correct. I get 2 warnings (failed to converge
and unable to evaluated scaled gradient). Interestingly I appear to not
get these warnings if I am running linear mixed models (lmer). Am I
doing this correctly?
Probably. There are lots of false positives. See ?convergence
> Lastly, is it appropriate to use interaction terms in GLMMs and lmers?
I imagine that the rainfall level my interact with the treatment to
influence the pollination metric.
> metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data
= UHURUnets, family = gaussian(link = log)??)
>
Definitely OK.
More information about the R-sig-mixed-models
mailing list