[R-sig-ME] Modeling attacks and victories
Poe, John
jdpo223 at g.uky.edu
Thu Apr 20 04:48:22 CEST 2017
The standard way to deal with it in political science is going to be to
aggregate up to a count of attempted or successful attacks (depending on
their question I suppose) and use something like a zero inflated mixed
effects negative binomial model with a nonparametric random effect because
of the zero inflation and low mean relative to the long tail.
One way to expand that might be to model a count of terror attack attempts
as a function of successful attacks in the country last year and if there
were any attack attempts in the last year. The data are pretty highly
correlated over time so I'd assume past experiences drive current ones.
That's going to induce correlation between the past time variables and
random effects which causes bias but can be dealt with as a multilevel
structural equation model according to a paper by Paul Allison titled
dynamic panel data modelling using maximum likelihood. It's published in
an econometrics journal now but I can't remember which.
I wouldn't use proportion successful as an outcome unless the student is
trying specifically to answer what makes terrorist groups more successful
at attacks assuming they try to attack. That's a very specific research
question but you could manage it with a selection model for any attempt yes
or no as the selection stage outcome and the proportion successful as the
second stage model. If you don't separate the zeroes into a never
attempted group and a tried but failed group any reviewer in political
science will start screaming about terrorists selecting countries where
they are more likely to be successful as places to attempt attacks so that
success drives attempts. Even then I can see it being a problem modelling
the proportion given that a low success rate may lead to fewer future
attempts which could lower it even more.
This paper might help.
Grilli, L. and Rampichini, C., 2010. Selection bias in linear mixed models.
Metron, 68(3), pp.309-329.
You could model this in stata or mplus as a random effects selection model
pretty easily but in R you would probably need Stan to allow you to
correlate the random effects and error terms. I'm not sure if there's
another package that would?
What you described with an incidence-level model was, I think, a hurdle
model where you just model any attack yes/no and successful attack if any
attack yes/no with two logistic regression models with correlated error
terms. You can find something similar with the craggit extension of the
tobit model out of econometrics. Again, you could fit It in Stata or mplus
easy enough but I don't know of an R package that will let you correlate
the decomposed errors across two logistic regressions and give you random
intercepts unless you designed it in something like Stan.
I'd also worry about truncating out the zero attempted attack country-years
as it's going to distort the random effects estimation away from zero.
Hope that helps. I'm pretty tired at the moment so it might not be
entirely coherent.
On Apr 19, 2017 11:39 AM, "Paul Johnson" <pauljohn32 at gmail.com> wrote:
Could I ask for pointers on how to guide a student in my multilevel
modeling course?
The outcome data is terrorist attack events, with one row per event
(events are listed by country and year). The data also indicates if
each attach is a "success" (I have no idea how that's measured, if it
matters I can find out).
The student says that, in his field, what they would do is aggregate
events at the country/year level to create a "proportion of successful
attacks" variable. If a country has no events, then it is scored as a
0. Then they'd run random intercept models using country as case
identifier, possibly with other country level predictors that vary
across time.
I think we can do better than that. The number of events within
countries varies widely, some have 0 or 1 attack, while in some years
there are 30 or more. Measuring the proportions is, obviously,
sensitive to the number in the denominator. Some countries are scored
on a scale 0, .5, 1, while others are scored as 0, 0.03, 0.06, and so
forth. Other obvious problems are the presence of 0's.
My first idea was to made this a binomial glm and predict successes as
a proportion of attacks. That's a problem because there are lots of 0
attack country/years, but also because I'm
It looks to me like we need to explore this as a two part model, where
part 1 predicts (attacks > 0) and part 2 is binomial among the
countries and places where attacks > 0. I'm not finding discussion of
this particular example while searching (I probably don't know the
magic words). However, we need to insert the country-level intercept
in both models, and perhaps the country effect needs to be correlated
between the two models.
pj
--
Paul E. Johnson http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
To write to me directly, please address me at pauljohn at ku.edu.
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list