[R-sig-ME] multilevel time series?

Fri Oct 1 13:08:22 CEST 2010

Many thanks to both of you. However, I realise now that my description of the data was misleading. When I said "each observed about 15 or 16 times over about a 30-year period", I was referring to *states*, not individuals. So each individual survey respondent appears only once in the dataset, but the states from which samples of respondents are drawn are observed multiple times. (Conceivably, a given person might be sampled more than once, but if that occurs at all it will be extremely rare, and the dataset we have won't tell us this anyway.)

However, your answers/suggestions have, actually, been very interesting, and useful. Consider:

The models we were using looked like:
lmer(outcome ~ covariates + time + (1 | stateyear) + (1 | state), data=data) # time is a linear effect, measured in (whole) years from the earliest year in the dataset

If I understand correctly, Thierry suggests:
lmer(outcome ~ covariates + (1 | stateyear) + (1 | state) + (1 | time), data=data)

I tried this, and the variance at the time level is small, but not zero, implying there are some year-specific disturbances that are common across states. I also got similar results from:
lmer(outcome ~ covariates + time + (1 | stateyear) + (1 | state) + (1 | time), data=data) # time as *both* a linear fixed effect and a random effect

However Doug seems to suggest:
lmer(outcome ~ covariates + time + (1 | stateyear) + (time | state), data=data) # and maybe also with a fixed effect and random slope for time^2

Both suggestions seem reasonable to me, and in fact I ran a model combining them, and got sensible results:
lmer(outcome ~ covariates + time + (1 | stateyear) + (time | state) + (1 | time), data=data)

My co-author and I are satisfied with this, but our reviewer implied he/she wanted to see the model with an AR1 structure, and we're stumped about how to do this. The nlme package can include AR1, but only where the autocorrelation is across lowest-level units, and in our case the autocorrelation is across higher-level units.

Doug, can you suggest a reference that we might use to justify our choice of model? (That, is assuming you still agree with what we're doing, given the revised description of the data, provided above.)

The suggestions are very much appreciated.
- Malcolm

On 30 Sep 2010, at 20:56, Douglas Bates wrote:

> On Mon, Sep 27, 2010 at 3:34 AM, ONKELINX, Thierry
> <Thierry.ONKELINX at inbo.be> wrote:
>> Dear Malcolm,
> 
>> Your design requires IMHO crossed random effects instead of nested
>> random effects. Individual is clearly crossed with year. Each individual
>> can be surveyed in more that one year and vice versa. If they were
>> nested, all data from a specific individual would come from only one
>> specific year. The same goes for state and year, they are rather crossed
>> than nested.
> 
> Malcolm's original description mentions modeling a linear trend in
> time, which would make sense to me.  Even taking into account the fact
> that a person can move from one state to another (hence you don't have
> strict nesting of the person and state factors) such data can still be
> analyzed using lme4.  Before doing so I would want to plot response
> versus time for several individuals, just to see if a linear trend
> looks adequate.  Having 15 to 20 different time points per subject
> would allow you to model more than a linear trend within subject.
> 
> Sometimes people will approach such a case using time series methods,
> even though the series are rather short.  Simple relationships like an
> AR1 (first-order autoregressive) model generate marginal covariance
> patterns that are very similar to that generated by a model with
> per-subject random effects for the intercept and the slope with
> respect to time.  This is why I don't usually combine these terms.  It
> is hard to separate out the effect of each.
> 
> Your suggestion is somewhat different.  It is more like a panel data
> type of model and could definitely be appropriate if the effect of a
> particular year was more-or-less common across subjects.  This type of
> model is applied to data like the quarterly profits of several
> companies.  Macro-economic forces can (and did) have industry-wide
> effects on the Q1 results in 2009 so it makes sense to regard each
> time period as distinct.
> 
> If, on the other hand, you had time trends within individuals but not
> synchronized across time periods then I would set up a model for the
> within-subject time trends and try to incorporate random effects in
> that model, as Malcolm seems to indicate they have done.
> 
>> Fitting year as a crossed random effect will take nonstationarity along
>> time into account. The size of variance of this random effect will
>> indicate how strong this nonstationarity is.
>>> -----Oorspronkelijk bericht-----
>>> Van: r-sig-mixed-models-bounces at r-project.org
>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens
>>> Malcolm Fairbrother
>>> Verzonden: zondag 26 september 2010 21:18
>>> Aan: r-sig-mixed-models at r-project.org
>>> Onderwerp: [R-sig-ME] multilevel time series?
>>> 
>>> Dear all,
>>> 
>>> In macro-social science, it's become fairly conventional to
>>> analyse repeated cross-sectional survey data using
>>> three-level models. Individual survey espondents (level-1)
>>> are nested in state-years (level-2), which are in turn nested
>>> within states (level-3). One big pay-off is the ability to
>>> examine how time-constant or time-varying state-level
>>> variables affect level-1 outcomes.
>>> 
>>> A co-author and I recently had a reviewer question whether
>>> this approach is adequate, however. He/she suggested that
>>> this approach could generate very misleading results, if the
>>> data are nonstationary. (We just included a linear time
>>> effect in our models.) So I'm thinking about how to proceed
>>> (and I'm not particularly knowledgeable about time series
>>> analysis). Any advice would be much appreciated. We used lme4
>>> to fit the models in our paper, and we have several tens of
>>> thousands of respondents nested in 48 states, each observed
>>> about 15 or 16 times over about a 30-year period.
>>> 
>>> (1) Is the reviewer's query? Is he/she right to question this
>>> approach?
>>> 
>>> (2) How might we test for nonstationarity? The reviewer
>>> mentioned differencing the outcome variable, but in a
>>> multilevel context I'm not sure how to do that... Perhaps we
>>> could calculate an *aggregate* value for every state-year,
>>> and check the aggregated data for autocorrelation? My
>>> understanding is that autocorrelation across multiple lags is
>>> a strong indicator of nonstationarity (while, conversely, the
>>> absence of multiple-lag autocorrelation is almost a guarantee
>>> of stationarity). I believe this can be done with nlme, as a
>>> two-level model, with state-years nested within states.
>>> 
>>> (3) However, that approach would seem to throw away a lot of
>>> level-1 information (about individual respondents), and I'm
>>> not sure about the implications for any significance tests.
>>> An alternative approach would seem to be "multilevel time
>>> series", where autocorrelation at the *group* rather than
>>> individual/first level is specifically allowed for in the
>>> model. However, I can't find any references to R packages (or
>>> other software) that allow for the specification of, for
>>> example, AR1 processes at anything other than level-1 in
>>> multilevel models.
>>> 
>>> In short, I'd be curious to hear what people think...
>>> (especially if anyone out there happens to be a whiz at both
>>> multilevel and time series analysis). I hope I've been clear
>>> about the problem, but I'm happy to elaborate. Thanks in
>>> advance for any help.
>>> 
>>> Cheers,
>>> Malcolm
>>> 
>>> 
>>> Dr Malcolm Fairbrother
>>> Lecturer
>>> School of Geographical Sciences
>>> University of Bristol
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>> 
>> Druk dit bericht a.u.b. niet onnodig af.
>> Please do not print this message unnecessarily.
>> 
>> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
>> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
>> door een geldig ondertekend document. The views expressed in  this message
>> and any annex are purely those of the writer and may not be regarded as stating
>> an official position of INBO, as long as the message is not confirmed by a duly
>> signed document.
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>