[R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Jun 3 09:27:04 CEST 2010


On Wed, 2010-06-02 at 13:45 +0200, Joris Meys wrote:
> Hi Diederik,
> 
> I can't say immediately why your models fail without seeing the data and
> running a number of tests. You could play around with the other parameters,
> as the problem might be related to the optimization algorithm.
> 
> A different approach would be using the methods in the vegan package. This
> package contains a set of ordination methods that allow to map environmental
> variables on the ordination space.

vegan is probably not too useful here as the response is univariate;
counts of ducks.

> You could also use the hurdle model from the pscl library, but this is not
> guaranteed to work better. It is not a zero-inflation model, but a
> two-component model that fits the zeros using eg a binomial and the rest of
> the counts using eg a poisson or a negative binomial.

zeroinfl() in the same (pscl) package will fit ZIP and ZINB models.
Neither of these will be a GAM though, so if the OP was wanting smooth
functions of covariates they may not so useful, although bs() in package
splines might be used to include smooth terms into these functions.

gam() in mgcv has the negative binomial family that might be a useful
starting point to modelling the overdispersion.

Package vgam has functions to fit ZIP/ZINB or the hurdle models
mentioned above with smooth functions of covariates.

In any case the OP should probably seek help directly from the gamlss
package maintainers to get gamlss working on their data or suggest why
the fitting isn't working.

HTH

G

> 
> Cheers
> Joris
> 
> On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <diederik.strubbe at ua.ac.be
> > wrote:
> 
> > Dear all,
> >
> > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows
> > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit
> > variables. However, most models fail because  the global deviance is
> > increasing and I am not sure what causes this behaviour. The dataset
> > consists of counts of birds (duck) and 5 habit variables measured in the
> > field (n= 182). The dependent variable (the number of ducks
> > counted)suffers from zero-inflation and overdisperion:
> >
> > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182)
> > > mean <- mean(data$duck)
> > > var <- var(data$duck)
> > > proportion_non_zero
> > [1] 0.1153846
> > > mean
> > [1] 1.906593
> > > var
> > [1] 37.35587
> >
> > (I have no idea how to simulate a zero-inflated overdispersed Poisson
> > variable, but the data used can be found at
> > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519).
> >
> >
> > First, I create a (strong) pattern in the dataset by:
> > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck)
> >
> > I try to analyze these data by fitting several possible distributions
> > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II
> > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic
> > splines with a df=3. The best fitting model will then be choses on the basis
> > of its AIC.
> >
> > However, these models frequently fail to converge, and I am not sure why,
> > and what to do about it. For example:
> >
> > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > PO)
> > GAMLSS-RS iteration 1: Global Deviance = 1350.623
> > GAMLSS-RS iteration 2: Global Deviance = 1350.623
> >
> > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > ZIP)
> > GAMLSS-RS iteration 1: Global Deviance = 326.3819
> > GAMLSS-RS iteration 2: Global Deviance = 225.1232
> > GAMLSS-RS iteration 3: Global Deviance = 319.9663
> > Error in RS() : The global deviance is increasing
> >  Try different steps for the parameters or the model maybe inappropriate
> > In addition: There were 14 warnings (use warnings() to see them)
> >
> > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > NBI)
> > GAMLSS-RS iteration 1: Global Deviance = 291.8607
> > GAMLSS-RS iteration 2: Global Deviance = 291.3291
> > ######......######
> > GAMLSS-RS iteration 4: Global Deviance = 291.1135
> > GAMLSS-RS iteration 20: Global Deviance = 291.107
> > Warning message:
> > In RS() : Algorithm RS has not yet converged
> >
> > > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > NBII)
> > GAMLSS-RS iteration 1: Global Deviance = 284.5993
> > GAMLSS-RS iteration 2: Global Deviance = 281.9548
> > ######......######
> > GAMLSS-RS iteration 5: Global Deviance = 280.7311
> > GAMLSS-RS iteration 15: Global Deviance = 280.6343
> >
> > > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) +
> > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family=
> > ZINBI)
> > GAMLSS-RS iteration 1: Global Deviance = 1672.234
> > GAMLSS-RS iteration 2: Global Deviance = 544.742
> > GAMLSS-RS iteration 3: Global Deviance = 598.9939
> > Error in RS() : The global deviance is increasing
> >  Try different steps for the parameters or the model maybe inappropriate
> >
> >
> > Thus, in this case, only the Poisson (PO) and Negative Binomial type I
> > (NBI)converge whereas all other models fail
> >
> > My first approach was to omit the smoothing factors for each model, or
> > further reduce the number of variables but this does not solve the problem
> > and most models fail, often yielding a Error in RS() : The global deviance
> > is increasing message.
> >
> > I would think that, given the fact that the dependent variable is
> > zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial
> > (ZINBI) distribution would be the best fit, but the ZINBI even fails in the
> > following very simple examples.
> >
> > > model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI)
> > GAMLSS-RS iteration 1: Global Deviance = 3508.533
> > GAMLSS-RS iteration 2: Global Deviance = 1117.121
> > GAMLSS-RS iteration 3: Global Deviance = 652.5771
> > GAMLSS-RS iteration 4: Global Deviance = 632.8885
> > GAMLSS-RS iteration 5: Global Deviance = 645.1169
> > Error in RS() : The global deviance is increasing
> >  Try different steps for the parameters or the model maybe inappropriate
> >
> > > model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)
> > GAMLSS-RS iteration 1: Global Deviance = 3831.864
> > GAMLSS-RS iteration 2: Global Deviance = 1174.605
> > GAMLSS-RS iteration 3: Global Deviance = 562.5428
> > GAMLSS-RS iteration 4: Global Deviance = 344.0637
> > GAMLSS-RS iteration 5: Global Deviance = 1779.018
> > Error in RS() : The global deviance is increasing
> >  Try different steps for the parameters or the model maybe inappropriate
> >
> >
> >
> > Any suggestions on how to proceed with this?
> >
> > Many thanks in advance,
> >
> >
> > Diederik
> >
> >
> > Diederik Strubbe
> > Evolutionary Ecology Group
> > Department of Biology
> > University of Antwerp
> > Groenenborgerlaan 171
> > 2020 Antwerpen, Belgium
> > tel: +32 3 265 3464
> >
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list