[R-sig-ME] model advice

Thu Jul 28 04:03:41 CEST 2016

Hello!

I'm a master's student studying pollination networks. I have been furiously trying to learn about linear mixed models and glmms, but I have some specific questions relating to my project analysis that I am hoping someone can help me with

Here's the short of my project. I can provide more details if need be. I am looking at 16 pollination metrics (ex. specialization). A few of the metrics are count data (ex. floral abundance) and several are proportions (limited to be between 0 and 1). I am interested in how rainfall (high and low location) and wildlife exclusion (treatment) affect the pollination metrics. I have constructed 12 networks in total. 6 networks in the low rainfall area having 3 networks with wildlife excluded and 3 networks allowing wildlife. Then there are 6 networks in the high rainfall area again have 3 networks with wildlife excluded and 3 with wildlife included. So sample size is obviously small. It's a block design with 3 blocks in the low rainfall and 3 blocks in the south location. Each block has the wildlife excluded treatment and the wildlife allowed treatment.

Here are my questions:

The majority of my metrics fit model assumptions (normality of residuals, variance within groups, normality within groups, normality of random effects, and linearity/absence of heteroskedasticity). However I have some where normality appear to be violated and the fitted vs residuals plot is no good. Various transformations (log, ln, sqrt,arcsin(sqrt)) don't seem to help.  >From reading papers by Dr. Ben Bolker, this is where it appears GLMMs come in.

So for the metrics that fit model assumptions my plan is to fit this model

    metric.model <- lmer(metric ~ Treatment + Location + (1 |Blocks), data = UHURUnets)?

but for those where model assumptions aren't met, I'm not sure how one picks which exponential family to use and which link to use. How does one go about deciding what family and link to use?

I read in Dr. Bolker's TREE paper that binomial distribution and logit link are best for proportions. Is this generally the case?

NODF.M1 <- glmer(weighted_NODF ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = binomial(link = "logit")?

For the count data (ex. floral abundance, insect abundance), it seems like I should use Poisson and log link according to that same paper paper.

No.Fl.units.M1 <- glmer(number_of_floral_units ~ Treatment + Location + (1|Blocks), data = UHURUnets, family = poisson(link = "log")?

But what distribution and link would one use for continuous data that is not in proportions?

And once you have made a GLMM model, I am assuming it is okay that this model still does not fit the normality assumptions or the residual vs fitted plots. Is this true?

My models (both glmms and lmer) currently only have random intercepts. I have read that it might be wise to also have random slopes as well because the pollination metric could vary for each treatment and location depending on which block it is in.

So then I believe I would have a model like this
?
Vuln.LL.M3 <- glmer(vulnerability.LL ~ Treatment + Location + (1 + Treatment|Blocks) + (1 + Location|Blocks), family = gaussian(link = log), data = UHURUnets)

I am not sure if this is correct. I get 2 warnings (failed to converge and unable to evaluated scaled gradient). Interestingly I appear to not get these warnings if I am running linear mixed models (lmer). Am I doing this correctly?

Lastly, is it appropriate to use interaction terms in GLMMs and lmers? I imagine that the rainfall level my interact with the treatment to influence the pollination metric.

 metric.model <- glmer(metric ~ Treatment*Location + (1 |Blocks), data = UHURUnets, family = gaussian(link = log)??)

Many thanks in advance for your help!

Cheers,
Travis

	[[alternative HTML version deleted]]