[R-sig-Geo] lagsarlm with simulated data

Tue Aug 17 19:54:34 CEST 2010

On Tue, 17 Aug 2010, Malcolm Fairbrother wrote:

> Dear Roger (and any other interested parties),
>
> Thanks very much for responding. I tried adding the scale() argument you 
> suggested, but that didn't seem to make any difference. I've set a seed, 
> and to make it even more easily reproducible, I've loosely followed code 
> from: 
> http://www.mail-archive.com/r-sig-geo@stat.math.ethz.ch/msg00799.html. 
> The contiguities (in addition to the data) are now generated directly 
> using the code below.
>
> I'm still getting consistent upward bias for the Intercept, and 
> otherwise perfect recoveries of the data-generating parameters. I tried 
> reversing the sign of rho, and that didn't make any difference.
>
> Any ideas? Sorry to bother you with this, but I'd like to know why the 
> simulation is generating this result.

Malcolm,

The reproducible code does make life easier, thanks. I don't have a 
solution, but see also:

Xbe <- X0*Bs[1]+X1*Bs[2]+X2*Bs[3]+X3*Bs[4]+e
y <- Xbe
lm_res1 <- lapply(1:sims, function(i) lm(y[,i] ~ X1[,i] + X2[,i] +
  X3[,i]))
apply(do.call("rbind", lapply(lm_res1, coefficients)), 2, mean)
apply(do.call("rbind", lapply(lm_res1, coefficients)), 2, var)

which exhibits somewhat similar traits in the variance of the intercept. 
I've also tried an intercept-only scenario without more light appearing, 
and a fixed X scenario. The intercept appears to "soak up" other variance. 
In the lag case, this gets diffused/smoothed through (I - \rho W)^{-1} in 
addition.

Other ideas, anyone?

Roger

>
> Thanks again,
> Malcolm
>
>
> library(spdep)
> sims <- 1000 # set number of simulations
> rho <- 0.2 # set autocorrelation coefficient
> Bs <- c(2, 5, 3, -2) # set a vector of betas
> nb7rt <- cell2nb(7, 7, torus=TRUE) # generate contiguities
> listw <- nb2listw(nb7rt)
> mat <- nb2mat(nb7rt) # create contiguity matrix
> set.seed(20100817)
> e <- matrix(rnorm(sims*length(nb7rt)), nrow=length(nb7rt)) # create random errors
> e <- scale(e, center=TRUE, scale=FALSE) # constrain mean of errors to zero
> X0 <- matrix(1, ncol=sims, nrow=length(nb7rt)) # create Intercept
> X1 <- matrix(rnorm(sims*length(nb7rt), 4, 2), nrow=length(nb7rt)) # generate some covariates
> X2 <- matrix(rnorm(sims*length(nb7rt), 2, 2), nrow=length(nb7rt))
> X3 <- matrix(rnorm(sims*length(nb7rt), -3, 1), nrow=length(nb7rt))
> Xbe <- X0*Bs[1]+X1*Bs[2]+X2*Bs[3]+X3*Bs[4]+e
> y <- solve(diag(length(nb7rt)) - rho*mat) %*% Xbe # generate lagged y data
> lag_res1 <- lapply(1:sims, function(i) lagsarlm(y[,i] ~ X1[,i] + X2[,i] + X3[,i], listw=listw)) # fit the model
> apply(do.call("rbind", lapply(lag_res1, coefficients)), 2, mean) # mean estimates are excellent, except Int
> apply(do.call("rbind", lapply(lag_res1, coefficients)), 2, var) # variance is also a lot higher for Int
>
>
>
>
>
> On 16 Aug 2010, at 19:53, Roger Bivand wrote:
>
>> On Mon, 16 Aug 2010, Malcolm Fairbrother wrote:
>>
>>> Dear list,
>>>
>>> I am running some simulations, trying to use lagsarlm (from the spdep package) to recover the parameters used to generate the data. In a basic simulation I am running, I am finding that I am able to recover rho almost perfectly, and all but one of the betas perfectly. However, the beta attached to the constant is substantially biased, for reasons I cannot understand.
>>>
>>> The code I am using is below. The spatial weights matrix is from the 48 contiguous U.S. states (rows sum to 1). Can anyone see where I am going wrong? Or is the biased B0 coefficient somehow a consequence of the particular neighbourhood structure I'm using? Any help or tips would be much appreciated.
>>
>> Malcolm,
>>
>> You didn't set a seed, so I can't reproduce this exactly, but I think that the mean of e will be added to your constant, won't it?
>>
>> n <- 48
>> e <- matrix(rnorm(n*1000, mean=0, sd=4), 1000, n)
>> summary(apply(e, 1, mean))
>>
>> Could you do a scale(e, center=TRUE, scale=FALSE) to force it to mean zero?
>>
>> Hope this helps,
>>
>> Roger
>>
>>>
>>> - Malcolm
>>>
>>>
>>>> n <- dim(W)[1] # sample size
>>>> Bs <- c(2, 5, 3, -2) # vector of Beta coefficients
>>>> rho <- 0.2 # set autocorrelation coefficient
>>>> bres <- matrix(NA, nrow=1000, ncol=5)
>>>> for (i in 1:1000) {
>>> + e <- rnorm(n, mean=0, sd=4)
>>> + X1 <- rnorm(n, 4, 2) # create some independent variables
>>> + X2 <- rnorm(n, 2, 2)
>>> + X3 <- rnorm(n, -3, 1)
>>> + X <- cbind(rep(1, n), X1, X2, X3)
>>> + y <- (solve(diag(n)-rho*W)) %*% ((X%*%Bs)+e) # generate lagged Ys
>>> + data <- as.data.frame(cbind(y, X))
>>> + lagmod <- lagsarlm(y ~ X1 + X2 + X3, data=data, oxw.listw1)
>>> + bres[i,] <- coefficients(lagmod)
>>> + }
>>>> apply(bres, 2, mean)
>>> [1]  0.1876292  2.6275783  4.9897827  3.0060831 -1.9883803
>>>> apply(bres, 2, median)
>>> [1]  0.1907057  2.4000895  4.9887496  3.0043258 -2.0076960
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no