[R] negative binomial regression, unbalanced panel
hl
hl.statlist at googlemail.com
Wed Nov 24 17:44:07 CET 2010
I am a student who is doing empirical work for his thesis and trying to
switch to R. I am familiar with Stata, and at the moment I am trying to
replicate some of my previous work.
I have a large unbalanced panel data set, observations for different
countries between 1970 and 2007. My dependent variable is an
overdispersed count. So far I have used fixed-effects negative binomial
regression, i.e. assuming constant within-group dispersion. The command
in Stata is xtnbreg, fe.
How could I replicate this in R?
I have found the package pglm, and tried the following
pglm(T_total ~ Lgdpqt_2 + Lgdpqt_3 + Lgdpqt_4 + lpop + yrsconflict +
past_T_total + Lpolcat_2 + Lpolcat_3 + Lpolcat_4 + Lgdpgr +
mob_fixed +
wdi_urbanpop + Lopen + Ldurable + factor(year), data = df, family =
negbin, model = "within", index = c("code","year")))
This takes ages, and then returns the following
Maximum Likelihood estimation
Newton-Raphson maximisation, 3 iterations
Return code 3: Last step could not find a value above the current.
Boundary of parameter space?
Consider switching to a more robust optimisation method temporarily.
Log-Likelihood: 112720.7
46 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
(Intercept) -177.015528 70.277178 -2.5188 0.01177 *
Lgdpqt_2 -34.386693 NA NA NA
Lgdpqt_3 -26.709422 NA NA NA
Lgdpqt_4 -53.875809 NA NA NA
lpop 34.821642 NA NA NA
yrsconflict -8.693849 NA NA NA
past_T_total -9.558045 NA NA NA
Lpolcat_2 -11.601625 NA NA NA
Lpolcat_3 2.397754 0.374797 6.3975 1.580e-10 ***
Lpolcat_4 -11.661048 NA NA NA
..........
and several warnings.
If I drop the year dummies (is factor(year) more appropriate than a
list of variables?), the results are the same as in Stata, but it is
still taking quite long and the warnings persist. I think the problem
lies somehow with figuring out the estimation sample. Stata
automatically drops groups with all zero outcomes and with only one obs
per group, as well as those year dummies that are unnecessary (I do the
same regression for different dependent variables). The documentation
for pglm mentions that there might be problems with unbalanced panels.
How could I go about doing this? Did I make a mistake using pglm, or is
it simply unsuited for my task? I think this could possibly be
formulated as a mixed model. I looked into nlme, which afaik doesnt
support the negative binomial family. Hope this is a relevant issue, I
could find more anything else on the web/ this list, and a similar
question on stackoverflow was left without a suitable answer.
On a side note, I'm used to using underscores in variable names, but
have read that this is not good pratice in R and that dots should be
used instead. whats the reason behind that?
Thanks very much for your help,
hl
More information about the R-help
mailing list