[R-sig-eco] Book: The World of Zero-Inflated Models
Highland Statistics Ltd
h|gh@t@t @end|ng |rom h|gh@t@t@com
Tue Aug 9 09:13:23 CEST 2022
Hello,
The following book can now be ordered (exclusively) from www.highstat.com:
The World of Zero-Inflated Models. Volume 1: Using GLM.
For a table of contents and two free chapters, see:
http://highstat.com/index.php/the-world-of-zero-inflated-models
Kind regards,
Alain
Outline of Volume 1.
In Chapter 2 we revise data exploration and multiple linear regression
using red knot data. Stable isotope ratios of nitrogen in animal tissues
are modelled as a function of 3 covariates. This chapter serves as a
blueprint for all other chapters in the sense that it shows the general
outline of a statistical analysis.
Chapter 3 starts with a revision of the Poisson distribution and the
Poisson GLM for the analysis of count data. We use a small puffin data
set. We also introduce the NB GLM and two relatively unknown, but
useful, members of the family, namely the GP GLM and the CMP GLM.
Surprisingly, the latter two models tend to perform better than the NB
GLM in the case of overdispersion. The latter two can also be used to
deal with underdispersion. Most models are fitted with the glmmTMB
package in R. Model validation tools are explained, and the concept of
simulating data from a model (to verify whether it complies with all
assumptions of the model) is introduced. We first do the simulation
steps ourselves, then quickly migrate to the DHARMa package, which is
rapidly gaining popularity.
In Chapter 4 we introduce zero-inflated models for count data, and these
are executed with the glmmTMB package. We start with a basic
introduction using simulated data, and discuss zero-inflated Poisson
(ZIP), zero-inflated NB (ZINB), zero-inflated generalised Poisson (ZIGP)
and zero-inflated CMP (ZICMP) models. We then apply them all on the
puffin data set.
In Chapter 5 we analyse data on parasites in Brazilian sandperch. Such
data nearly always bring you within zero-inflation territory. Now that
we are familiar with Poisson, NB, GP, CMP models, and their
zero-inflated cousins, it is time to learn how we can manoeuvre among
them. How do we decide to apply an NB GLM or a ZIP model? In this
chapter, we will keep the binary part of the model simple.
Chapter 6 is about ZIGP models. Data on mistletoe tree infections are
used. The ZIGP models contain covariates in both the count and binary
parts of the model.
Hurdle models for count data are discussed in Chapter 7 using dolphin
sighting data. In a hurdle model we perform 2 analyses. First, the
sighting abundances are converted into absence/presence data, and a
Bernoulli GLM is applied. Then the zero counts are set to NA (or
dropped), and a truncated Poisson (or NB) GLM is applied. In the third
step, the two components are combined to calculate the expected values
of the hurdle model. Chapter 7 is relatively long as it contains many
topics that may be relevant: Bernoulli GLM, quasi-separation, truncated
Poisson and NB distributions, and zero-altered Poisson (ZAP) and
zero-altered NB (ZANB) models.
In the last 2 chapters of this volume, we discuss models for the
analysis of continuous data with an excessive number of zeros. Biomass
of lobsters are analysed using Tweedie GLMs in Chapter 8, and a ZAG
model is applied on the same data in Chapter 9. The ZAG is a hurdle
model for continuous data. Our recommendation is to opt for the Tweedie
GLM approach.
--
Dr. Alain F. Zuur
Highland Statistics Ltd.
9 St Clair Wynd
AB41 6DZ Newburgh, UK
Email: highstat using highstat.com
URL: www.highstat.com
More information about the R-sig-ecology
mailing list