[R-sig-ME] lmer: Model with crossed and nested factors, unbalanced data
Clara Vandeweerdt
clara.vdw at gmail.com
Wed Jul 3 20:59:23 CEST 2013
Dear all,
For a research project on climate legislation in the U.S., I am analyzing
data on the votes that Senators cast on several cap-and-trade bills in the
period 2003-2008. For each Senator, we have data about how he or she voted
regarding a certain bill (i.e., 'yea' or 'nay')--given, of course, that
that Senator had a seat in Congress in the year that the bill was voted
upon. We want to explain the voting behavior of these Senators given
characteristics of the Senators and of their constituencies, that is, the
states they represent, but at the same time take into account the nested
structure of the data.
Thus, the data looks as follows:
state Senator bill vote
FL 'Bill Nelson' 'CSA2003' 'yea'
FL 'Bob Graham' 'CSA2003' 'yea'
FL 'Bill Nelson' 'CSA2005' 'yea'
FL 'Mel Martinez' 'CSA2005' 'nay'
(See attachment for a sample of the data.)
One choice to analyze such data seems to be a mixed model with both crossed
and nested random factors. First, Senators are expected to behave
consistently over time: their votes on different bills should be similar.
Second, pairs of Senators represent the same state: for example, in 2003,
Bill Nelson and Bob Graham both represented Florida. So, there seems to be
a random effect of Senators, which are nested in states. Third, there would
be a random effect of bill, which is crossed with states and Senators.
Finally, the model should be logistic, as votes can be either 'yea' or
'nay'.
1. How should I specify such a model? Is it sufficient just to specify both
the nested random effects of Senator and state, as well as the random
effect of bill (in analogy to this post:
http://r.789695.n4.nabble.com/lmer-crossed-random-effects-specification-td831762.html)?
For example, in case of a model with only random intercepts for Senator,
state and bill:
dataSenate <- read.table("sampledata.txt", header = TRUE, sep = "\t",
na.strings = c("-1"))
dataSenate$state <- as.factor(dataSenate$state)
dataSenate$Senator <- as.factor(dataSenate$Senator)
dataSenate$bill <- as.factor(dataSenate$bill)
library(lme4)
interceptonly <- glmer(vote ~ 1 + (1 | state/Senator) + (1 | bill), data =
dataSenate, family=binomial(link = "logit"))
Or should I use the pdBlocked and pdIdent formulation that is suggested
here: http://tolstoy.newcastle.edu.au/R/help/02b/2068.html?
2. This does not seem to be a balanced design: some Senators lost their
seat in the period 2003-2008, so that many of them did not vote upon all
three of the bills. In other words, for many Senator-bill-combinations,
there are no data. Should this affect my interpretation of the results?
Best regards,
Clara Vandeweerdt
Master in Comparative and International Politics, 2013
Faculty of Social Sciences
KU Leuven
Belgium
-------------- next part --------------
state bill Senator vote
WA CSA2003 Patty Murray 1
WA CSA2003 Maria Cantwell 1
WA ACSA2008 Patty Murray 1
WA ACSA2008 Maria Cantwell 1
WA CSA2005 Patty Murray 1
WA CSA2005 Maria Cantwell 1
DE CSA2003 Joseph Biden 1
DE CSA2003 Thomas Carper 1
DE ACSA2008 Joseph Biden -1
DE ACSA2008 Thomas Carper 1
DE CSA2005 Joseph Biden 1
DE CSA2005 Thomas Carper 1
WI CSA2003 Herbert Herb Kohl 1
WI CSA2003 Russell Feingold 1
WI ACSA2008 Herbert Herb Kohl 1
WI ACSA2008 Russell Feingold 1
WI CSA2005 Herbert Herb Kohl 1
WI CSA2005 Russell Feingold 0
WV CSA2003 John Jay Rockefeller 1
WV CSA2003 Robert Byrd 0
WV ACSA2008 John Jay Rockefeller 1
WV ACSA2008 Robert Byrd -1
WV CSA2005 John Jay Rockefeller 1
WV CSA2005 Robert Byrd 0
HI CSA2003 Daniel Akaka 1
HI CSA2003 Daniel Inouye 1
HI ACSA2008 Daniel Akaka 1
HI ACSA2008 Daniel Inouye 1
HI CSA2005 Daniel Akaka 1
HI CSA2005 Daniel Inouye 1
FL CSA2003 Bill Nelson 1
FL CSA2003 Bob Graham 1
FL ACSA2008 Bill Nelson 1
FL ACSA2008 Mel Martinez 1
FL CSA2005 Bill Nelson 1
FL CSA2005 Mel Martinez 0
More information about the R-sig-mixed-models
mailing list