[R-sig-Geo] Comparison coefficients from OLS and Spatial lag model

Roger Bivand Roger.Bivand at nhh.no
Fri Apr 20 19:46:50 CEST 2012


On Fri, 20 Apr 2012, David Marguerit wrote:

> Hi dear List members,
>
> I am trying to run a spatial lag regression incorporated in the spdep
> package R version 0.5-41. My computer has the follow configuration:
> Widows XP Service Pack 3
> Intel Pentium 4 CPU 2.80 Ghz
> 1.99 Go de Ram
>
> I have a geocoded (longitude and latitude) individual database with n=2696.
> On the left hand, I have the distance between each individual and the
> nearest polluting industries and on the right hand I have 7 explanatory
> variables.
> When I compare coefficients obtained from OLS and the Spatial Lag Model I
> can see huge differences. Is it normal?

Yes, but the models are different. You may compare spatial error 
coefficients with linear model coefficients, but not spatial lag mode 
coefficients (see LeSage & Pace 2009).

Your response is constructed in a really wierd way, you are trying to 
account for the distance between individuals and polluting industry. The 
autocorrelation is of course being created by you, when you define the 
response in this way. In addition, the distances are not going to change 
if you change the explanatory variables. I think that your model should be 
reconsidered completely, with distance to polluting industry as an 
explanatory variable, but I don't know what your response would be.

You simply cannot model in this way, look at what Waller & Gotway (2004) 
do, as for example in spdep in ?NY_data.

Hope this clarifies,

Roger

>
> The following lines resume my program:
>
> Firstly, I run a OLS regression:
>
>> tab<-read.dta("coord_modif.dta")> form.lin<-as.formula(log(entr)~log(rev_hab)+edu2+nonblanc+nonmaison+matr2+ocup2+ocup3+nonowner)> ols.lin<-lm(form.lin,data=tab)
>
>> summary (ols.lin)
> Call:
> lm(formula = form.lin, data = tab)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -3.12600 -0.42976  0.06275  0.48021  1.87649
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.71921    0.19842   3.625 0.000295 ***
> log(rev_hab)  0.06113    0.01813   3.372 0.000757 ***
> edu2         -0.13266    0.05570  -2.381 0.017315 *
> nonblanc     -0.22557    0.03892  -5.795 7.61e-09 ***
> nonmaison    -0.09585    0.03794  -2.526 0.011582 *
> matr2        -0.04329    0.02902  -1.492 0.135905
> ocup2         0.06978    0.08512   0.820 0.412425
> ocup3         0.09880    0.02901   3.406 0.000670 ***
> nonowner     -0.09370    0.04046  -2.316 0.020625 *
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0.7101 on 2687 degrees of freedom
> Multiple R-squared: 0.0562,	Adjusted R-squared: 0.05339
> F-statistic:    20 on 8 and 2687 DF,  p-value: < 2.2e-16
>
>
> Then, I create a Weight Matrix in order to check for the presence of
> Spatial Autocorrelation. I use the methodology of Sphere of Influence
> describe in *Applied Spatial Data Analysis with R* (Bivand et al., 2008) :
>
>> coords<-cbind(tab$long_dav, tab$lat_dav)
>> nb.temp<-tri2nb(coords)
>> nb<-graph2nb(soi.graph(nb.temp, coords))
>> plot(coords, col="red")
>> plot(nb, coords,add=TRUE)
>> title(main="Sphere of Influence Graph")
>> nb_1<-list(SOI=nb)
>> sapply(nb_1,function(x) is.symmetric.nb(x, verbose=FALSE, force=TRUE))
>
> SOI
> TRUE
>
>> listw<-nb2listw(nb,style="W")> summary(listw)
>
> Characteristics of weights list object:
> Neighbour list object:
> Number of regions: 2696
> Number of nonzero links: 7186
> Percentage nonzero weights: 0.09886611
> Average number of links: 2.66543
> Link number distribution:
>
>  1   2   3   4   5   6   7   8   9
> 452 889 759 382 150  50  12   1   1
> 452 least connected regions:
> 2 8 16 22 23 33 35 39 66 68 70 78 80 83 84 95 101 106 110 112 120 124
> 126 137 140 141 146 162 163 168 177 180 190 196 197 201 206 207 213
> 218 220 225 235 241 254 255 260 261 268 279 282 285 286 290 298 305
> 314 319 327 332 335 339 344 366 373 382 383 394 395 401 402 403 404
> 405 406 409 411 415 417 421 422 424 430 431 440 471 474 482 483 484
> 486 492 498 499 502 507 513 515 526 539 540 541 545 546 547 552 554
> 556 578 585 600 602 604 607 613 616 625 629 658 668 669 676 683 686
> 696 698 701 704 718 723 729 739 751 757 759 760 766 768 773 776 779
> 787 792 793 798 799 802 806 807 809 811 829 835 840 851 855 860 863
> 868 881 888 893 895 896 903 910 913 922 938 946 947 959 970 977 990
> 992 995 1000 1008 1018 1022 1023 1026 1027 1028 1031 1038 1039 1046
> 1048 1051 1080 1097 1104 1110 1111 1120 1145 1147 1165 1171 1178 1182
> 1188 1194 1196 1201 1202 1220 1222 1227 1230 1242 1246 1267 1269 1271
> 1275 1280 1289 1290 1299 1302 1303 1304 1305 1318 1327 1328 1348 1386
> 1391 1394 1395 1396 1406 1425 1432 1440 1441 1442 1451 1461 1474 1479
> 1487 1490 1495 1503 1506 1513 1518 1528 1539 1546 1551 1552 1559 1560
> 1571 1583 1585 1588 1593 1594 1604 1605 1606 1608 1611 1615 1617 1620
> 1631 1640 1644 1646 1649 1650 1655 1662 1676 1678 1683 1684 1693 1709
> 1713 1720 1725 1732 1740 1751 1770 1779 1781 1790 1794 1815 1831 1833
> 1835 1836 1837 1841 1848 1850 1855 1857 1860 1865 1871 1893 1911 1913
> 1920 1926 1933 1934 1936 1947 1949 1951 1953 1960 1962 1963 1964 1965
> 1972 1981 1991 2000 2004 2014 2017 2021 2032 2033 2046 2056 2058 2060
> 2061 2065 2066 2076 2077 2090 2091 2099 2102 2107 2108 2118 2122 2131
> 2133 2135 2136 2148 2151 2181 2182 2186 2190 2196 2207 2209 2214 2218
> 2224 2226 2230 2233 2235 2238 2242 2243 2248 2250 2258 2262 2268 2290
> 2292 2295 2296 2300 2329 2332 2346 2352 2355 2370 2383 2390 2393 2394
> 2398 2404 2413 2416 2418 2422 2440 2441 2442 2444 2450 2454 2460 2464
> 2465 2492 2498 2508 2510 2512 2523 2529 2531 2543 2547 2551 2552 2554
> 2556 2580 2582 2598 2603 2615 2618 2624 2627 2631 2634 2635 2637 2644
> 2645 2648 2653 2654 2655 2668 2673 2681 2688 2689 2692 with 1 link
> 1 most connected region:
> 2045 with 9 links
>
> Weights style: W
> Weights constants summary:
>     n      nn   S0       S1       S2
> W 2696 7268416 2696 2445.314 11176.87
>
>
> Lastly, I run the Moran's I test, the LM test and the spatial lag
> regression:
>
>> moran.ols<-lm.morantest(ols.lin,listw)> summary(moran.ols)            Length Class  Mode
> statistic   1      -none- numeric
> p.value     1      -none- numeric
> estimate    3      -none- numeric
> method      1      -none- character
> alternative 1      -none- characterdata.name   1      -none-
> character> print(moran.ols)
> 	Global Moran's I for regression residuals
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> Moran I statistic standard deviate = 50.1449, p-value <
> 2.2e-16
> alternative hypothesis: greater
> sample estimates:
> Observed Moran's I        Expectation           Variance
>      0.9184522908      -0.0006379116       0.0003359414
>
>> test.lm<-lm.LMtests(ols.lin,listw,test=c("LMerr","RLMerr","LMlag","RLMlag","SARMA"))> print(test.lm)
> 	Lagrange multiplier diagnostics for spatial dependence
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> LMerr = 2507.369, df = 1, p-value < 2.2e-16
>
>
> 	Lagrange multiplier diagnostics for spatial dependence
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> RLMerr = 7.4774, df = 1, p-value = 0.006248
>
>
> 	Lagrange multiplier diagnostics for spatial dependence
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> LMlag = 2646.837, df = 1, p-value < 2.2e-16
>
>
> 	Lagrange multiplier diagnostics for spatial dependence
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> RLMlag = 146.945, df = 1, p-value < 2.2e-16
>
>
> 	Lagrange multiplier diagnostics for spatial dependence
>
> data:
> model: lm(formula = form.lin, data = tab)
> weights: listw
>
> SARMA = 2654.314, df = 2, p-value < 2.2e-16
>
>> reg.slm<-lagsarlm(form.lin, data=tab,listw=listw)> summary(reg.slm)
> Call:lagsarlm(formula = form.lin, data = tab, listw = listw)
>
> Residuals:
>      Min        1Q    Median        3Q       Max
> -1.558639 -0.042609  0.019559  0.071850  1.367678
>
> Type: lag
> Coefficients: (asymptotic standard errors)
>                Estimate  Std. Error z value Pr(>|z|)
> (Intercept)   0.01565469  0.04231648  0.3699  0.71142
> log(rev_hab)  0.00990161  0.00385757  2.5668  0.01026
> edu2         -0.01153030  0.01184878 -0.9731  0.33049
> nonblanc     -0.01971576  0.00828432 -2.3799  0.01732
> nonmaison    -0.01407983  0.00807180 -1.7443  0.08110
> matr2        -0.00027776  0.00617216 -0.0450  0.96411
> ocup2         0.04241982  0.01810448  2.3431  0.01913
> ocup3         0.00732140  0.00617091  1.1864  0.23545
> nonowner     -0.01439652  0.00860479 -1.6731  0.09431
>
> Rho: 0.91255, LR test value: 6394.9, p-value: < 2.22e-16
> Asymptotic standard error: 0.0033595
>    z-value: 271.63, p-value: < 2.22e-16
> Wald statistic: 73784, p-value: < 2.22e-16
>
> Log likelihood: 299.2953 for lag model
> ML residual variance (sigma squared): 0.022811, (sigma: 0.15103)
> Number of observations: 2696
> Number of parameters estimated: 11
> AIC: -576.59, (AIC for lm: 5816.3)
> LM test for residual autocorrelation
> test value: 0.00077273, p-value: 0.97782
>
>
> I have a huge differences in the coefficients between the OLS and Spatial
> Lag Model (e.g. nonblanc: -0.225 VS -0.019). Does anyone know whether is it
> normal?
>
> Thank you very much for your help
> Marguerit David
> Phd student
> University Paris Dauphine
>
> 	[[alternative HTML version deleted]]
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list