[R-sig-ME] mixed mutlinomial regression for count data with, overdisperion & zero-inflation
dave fournier
davef at otter-rsch.com
Fri May 20 19:37:34 CEST 2016
My mac virtual machine is actually feeling more chipper than my windows
virtual machine today
so I'll build you a special version that seems to tease out solutions
for both nb1 and nb2 and
email it. But the real issues I think are diagnosing problems with
difficult data sets.
One problem is that the glmmadmb r package simply reports that the
Hessian is not positive definite
and quits. see this link
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2016q1/024527.html
for a case where one could conclude that the problem with fitting the
model was due to
confounding between the zero inflation and overdispersion.
Now with your model for one run I identified the negative eigenvalue
-15.28948399 of the Hessian with the
eigenvalues unsorted: 1.917979995e-10 -0.0004551907790
0.001293591754 0.003821048249 0.01431672721 0.1009762164 0.03925858165
0.07403629854 0.1091338760 0.1562519927 0.1703625637 0.1927551515
0.2234392242 0.2309947849 0.3469581818 0.3041749405 0.3580105168
0.3942153084 0.4397529164 0.5078767603 0.5728201455 0.6012492489
0.6789170419 0.7369245582 0.7971275668 0.8833795401 0.9287787445
0.9508016682 1.037844369 1.049178898 1.707802018 2.724758782 3.365700202
-15.28948399
eigenvector
-0.0006942405368 -0.07288724821 0.04506821612 0.08379747141
0.1184035873 0.1124181332 0.4898312779 0.2647606687
-0.005219962867 -0.01694772700 -0.02195235540 -0.0004078488080
-0.1039487736 0.4007922401 -0.007979620610 -0.02801923429
-0.03638402585 -0.0004982359168 -0.1679864668 0.6019723457
-0.01250628628 -0.04275353216 -0.05511398984 -0.001275366327
-0.2523634973 0.09075331140 -0.0008355685062 -0.005755760214
-0.007503459862 0.0001200448790 -0.02765030934 0.0001257008991
-0.009465390476 -0.07780205094
This is a more difficult case as it seems to involve almost all the
parameters. However the largest ones are all
for the parameters of the linear predictor. So it is saying that maybe
your model is a bit overparameterized
or equivalently that the parameters of the linear predictor are a bit
confounded.
Now in linear regression models one can try to deal with this situation
by employing ridge regression.
Really this is just putting a quadratic penalty on the parameters. We
can do this and decrease the size of the penalty
in stages and finally if desired doing away with it entirely. I set this
up the version of glmmadmb I am sending you.
However that does not deal with your outlier problem. For some reason a
lot of count data analyses get published without any analysis of the
residuals (at this point a disparaging remark about sociology is
probably in order).
These are the worst outliers for nb1 and nb2 models
for your data
1074 1074 413 5.13552e+01 1.36380e+01
1385 1385 4002 1.68879e+03 1.69679e+01
854 854 224 1.22219e+01 1.96515e+01
1691 1691 2713 8.33316e+02 2.27056e+01
1427 1427 1732 3.92621e+02 2.44684e+01
1433 1433 1612 3.25266e+02 2.72590e+01
1313 1313 1815 3.52356e+02 3.25137e+01
341 341 2031 3.55824e+02 4.22336e+01
191 191 5814 7.18097e+02 1.93656e+02
599 599 3586 2.68911e+02 2.19118e+02
1385 1385 4002 1.24681e+03 2.36563e+01
335 335 3012 4.87436e+02 5.08038e+01
1427 1427 1732 1.64012e+02 5.82439e+01
1433 1433 1612 1.39872e+02 6.02005e+01
1313 1313 1815 1.29395e+02 8.53171e+01
341 341 2031 1.39804e+02 9.94021e+01
1691 1691 2713 2.16615e+02 1.11783e+02
191 191 5814 6.96952e+02 1.45975e+02
599 599 3586 1.46454e+02 3.13865e+02
The term 3.13865e+02 correspond to a residual of over 17 standard
deviations.
One might expect that the influence of these large outliers has large
influence
on the parameter estimates and will invalidate any significance tests
one might
want carry out.
More information about the R-sig-mixed-models
mailing list