[R-sig-ME] mixed mutlinomial regression for count data with, overdisperion & zero-inflation

Fri May 20 19:37:34 CEST 2016

My mac virtual machine is actually feeling more chipper than my windows 
virtual machine today
so I'll build you a special version that seems to tease out solutions 
for both nb1 and nb2 and
email it.  But the real issues I think are diagnosing problems with 
difficult data sets.

One problem is that the glmmadmb r package simply reports that the 
Hessian is not positive definite
and quits.  see this link

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2016q1/024527.html

for a case where one could conclude that the problem with fitting the 
model was due to
confounding between the zero inflation and overdispersion.

Now with your model for one run I identified the negative eigenvalue  
-15.28948399 of the Hessian with the

     eigenvalues unsorted:    1.917979995e-10 -0.0004551907790 
0.001293591754 0.003821048249 0.01431672721 0.1009762164 0.03925858165 
0.07403629854 0.1091338760 0.1562519927 0.1703625637 0.1927551515 
0.2234392242 0.2309947849 0.3469581818 0.3041749405 0.3580105168 
0.3942153084 0.4397529164 0.5078767603 0.5728201455 0.6012492489 
0.6789170419 0.7369245582 0.7971275668 0.8833795401 0.9287787445 
0.9508016682 1.037844369 1.049178898 1.707802018 2.724758782 3.365700202 
-15.28948399

eigenvector
  -0.0006942405368 -0.07288724821  0.04506821612  0.08379747141 
0.1184035873   0.1124181332   0.4898312779   0.2647606687 
-0.005219962867 -0.01694772700 -0.02195235540 -0.0004078488080 
-0.1039487736   0.4007922401 -0.007979620610 -0.02801923429 
-0.03638402585 -0.0004982359168  -0.1679864668   0.6019723457 
-0.01250628628 -0.04275353216 -0.05511398984 -0.001275366327 
-0.2523634973  0.09075331140 -0.0008355685062 -0.005755760214 
-0.007503459862 0.0001200448790 -0.02765030934 0.0001257008991 
-0.009465390476 -0.07780205094

This is a more difficult case as it seems to involve almost all the 
parameters. However the largest ones are all
for the parameters of the linear predictor.  So it is saying that maybe 
your model is a bit overparameterized
or equivalently that the parameters of the linear predictor are a bit 
confounded.

Now in linear regression models one can try to deal with this situation 
by  employing ridge regression.
Really this is just putting a quadratic penalty on the parameters. We 
can do this and decrease the size of the penalty
in stages and finally if desired doing away with it entirely. I set this 
up the version of glmmadmb I am sending you.

However that does not deal with your outlier problem.  For some reason a 
lot of count data analyses get published without any analysis of the 
residuals (at this point a disparaging remark about sociology is 
probably in order).

These are the worst outliers for nb1 and nb2 models
for your data

1074 1074  413 5.13552e+01 1.36380e+01
1385 1385 4002 1.68879e+03 1.69679e+01
854   854  224 1.22219e+01 1.96515e+01
1691 1691 2713 8.33316e+02 2.27056e+01
1427 1427 1732 3.92621e+02 2.44684e+01
1433 1433 1612 3.25266e+02 2.72590e+01
1313 1313 1815 3.52356e+02 3.25137e+01
341   341 2031 3.55824e+02 4.22336e+01
191   191 5814 7.18097e+02 1.93656e+02
599   599 3586 2.68911e+02 2.19118e+02

1385 1385 4002 1.24681e+03 2.36563e+01
335   335 3012 4.87436e+02 5.08038e+01
1427 1427 1732 1.64012e+02 5.82439e+01
1433 1433 1612 1.39872e+02 6.02005e+01
1313 1313 1815 1.29395e+02 8.53171e+01
341   341 2031 1.39804e+02 9.94021e+01
1691 1691 2713 2.16615e+02 1.11783e+02
191   191 5814 6.96952e+02 1.45975e+02
599   599 3586 1.46454e+02 3.13865e+02

The term 3.13865e+02 correspond to a residual of over 17 standard 
deviations.
One might expect that the influence of these large outliers has large 
influence
on the parameter estimates and will invalidate any significance tests 
one might
want carry out.