[R-sig-ME] panel data with spatially patterned errors

Wed Jun 22 02:33:15 CEST 2011

Dear list,

How would you model the following data structure?

I have four waves of observations on a set of units, where units are nested within regions. Units do not move from region to region, for reasons that will become obvious below. So far so good: I should be able to fit growth curves, with a higher-level random intercept for region. However, the complication is that the units are gridcells, with known locations (longitude and latitude), and gridcells that are close to each other tend to have similar errors (as verified by semivariograms, etc.). So I would ideally like to allow for--and indeed investigate--the spatial correlation in the errors.

The lme function in nlme allows for a variety of spatial error structures, but (a) requires each lowest-level unit to have a different location, and (b) cannot deal with crossed random effects. The combination of (a) and (b) presents a challenge. Consider:

M1 <- lme(y ~ 1, random = ~ 1 | region/gridcell, correlation = corExp(form = ~ long + lat | region), data=data)

M2 <- lme(y ~ 1, random = ~ 1 | region/region.year/gridcell, correlation = corExp(form = ~ long + lat | region.year), data=data)

M1 won't work because gridcells do not have unique (non-repeated) locations within regions--since each gridcell is observed four times. Consequently, nlme finds units with zero distance to other units--something it doesn't like. In M2, units have unique locations (because each gridcell is observed only once in each "region.year"), but this model won't work because gridcells are not uniquely nested within region.years: each gridcell is observed in four different region.years. (In practice, nlme will allow M2, but only because it ignores that gridcell #1 in region A in year 1 is the same as gridcell #1 in region A in year 2--just at a different time. It treats them as wholly different entities.)

If all of the above is correct, I'm not sure how to proceed. Am I missing an easy resolution? It doesn't seem like such a far-fetched data structure. I had the thought that the solution might lie in a new/custom correlation structure (which Pinheiro and Bates say can be coded), but initial investigations suggest that writing such classes is far from straightforward, and it seems like it might be overkill.

Any suggestions would be much appreciated.

- Malcolm