[R-sig-Geo] Spatial Lags Excluding Neighbors' Missing Attribute Values
Christopher Moore
moor0554 at umn.edu
Sat Dec 27 22:45:32 CET 2008
Greetings,
I am seeking help with creating spatial lags that exclude neighbors with
missing attribute values. I searched the spdep help documentation and the R
Sig Geo Archives but was unable to find relevant information. (Please note
that I posted an R Sig Geo search engine here:
<http://mycroft.mozdev.org/search-engines.html?name=r+sig+geo>.)
As shown in the example below, the lag.listw function converts missing
attribute values to zero before calculating the mean of neighbhors'
attribute values (i.e., the spatial lag).
Do you have any advice for calculating spatial lags so that neighbors with
missing attribute values are excluded (rather than treated as zero)?
Alternatively, how might I remove neighbor links when the neighbor is
missing an attribute value? I would prefer the former approach (exclusion
from the calculation) because it wouldn't require creating a new neighbor
list each time a new attribute is analyzed. However, the latter approach
(editing the neighbor list) seems easier, if only I could get the list into
a data frame-like object for editing, which I can not figure out how to do.
I would greatly appreciate your help and have provided a reproducible
example below (see the very bottom of the of the example in particular).
Please let me know if you need further clarification.
Regards,
Chris
--
Christopher Moore, M.P.P.
Doctoral Student
Quantitative Methods in Education
University of Minnesota
moor0554 at umn.edu
http://www.tc.umn.edu/~moor0554/
> ##########
>
> library(spdep)
>
> ##process and summarize nc.sids data nc.sids <-
> readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],
> ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66")) rn <-
> sapply(slot(nc.sids, "polygons"), function(x) slot(x, "ID")) ncCR85_nb <-
> read.gal(system.file("etc/weights/ncCR85.gal", package="spdep")[1],
> region.id=rn) summary(ncCR85_nb, coordinates(nc.sids))
Neighbour list object:
Number of regions: 100
Number of nonzero links: 492
Percentage nonzero weights: 4.92
Average number of links: 4.92
Link number distribution:
1 2 3 4 5 6 7 8 9
2 6 15 16 23 18 16 2 2
2 least connected regions:
37053 37055 with 1 link
2 most connected regions:
37097 37125 with 9 links
Summary of link distances:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1197 0.3407 0.4018 0.4170 0.4922 0.7843
The decimal point is 1 digit(s) to the left of the |
1 | 22
1 | 778888888899
2 | 4444
2 | 5555555566666666667777778888888888888899999999
3 | 00000000000000111111111111111111112222222222223333333344444444
3 |
55555555555555555555555555555555666666666666666666666677777777777777+26
4 | 00000000000000000011111111111111111111112222222222222233333333333344
4 | 55555555555555555566666666667777777777777777888888889999999999
5 | 00000000111111111122222222222222222233333333334444
5 | 5555555555556666666666667788888899
6 | 00111122334444
6 | 5555557788999999
7 | 000044
7 | 88
> wts <- nb2listw(ncCR85_nb)
> str(wts$neighbours)
List of 100
$ : int [1:6] 17 19 41 68 76 79
$ : int [1:4] 14 18 49 97
$ : int [1:3] 5 86 97
$ : int [1:4] 62 77 84 90
$ : int [1:3] 3 95 97
$ : int [1:5] 12 14 56 61 95
$ : int [1:6] 25 48 59 69 74 94
$ : int [1:5] 42 46 59 66 94
$ : int [1:5] 24 26 71 78 82
$ : int [1:3] 24 65 71
$ : int [1:7] 44 45 56 58 81 88 100
$ : int [1:7] 6 14 18 23 55 56 81
$ : int [1:5] 49 60 80 84 90
$ : int [1:6] 2 6 12 18 95 97
$ : int [1:3] 27 37 70
$ : int [1:3] 25 52 67
$ : int [1:5] 1 41 68 73 79
$ : int [1:6] 2 12 14 23 49 55
$ : int [1:8] 1 32 43 53 63 68 76 92
$ : int [1:4] 22 38 57 87
$ : int [1:2] 37 72
$ : int [1:2] 20 57
$ : int [1:5] 12 18 36 55 81
$ : int [1:4] 9 10 71 78
$ : int [1:6] 7 16 52 54 69 74
$ : int [1:6] 9 43 47 63 78 82
$ : int 15
$ : int 48
$ : int [1:7] 30 34 41 62 76 80 84
$ : int [1:5] 29 34 49 80 99
$ : int [1:6] 52 54 67 71 82 96
$ : int [1:5] 19 39 68 73 92
$ : int [1:5] 42 59 64 74 98
$ : int [1:7] 29 30 41 79 85 86 99
$ : int [1:7] 39 42 51 64 91 92 93
$ : int [1:3] 23 55 60
$ : int [1:5] 15 21 46 70 72
$ : int [1:3] 20 57 87
$ : int [1:5] 32 35 73 91 92
$ : int [1:4] 54 74 96 98
$ : int [1:7] 1 17 29 34 76 79 85
$ : int [1:7] 8 33 35 59 64 66 93
$ : int [1:7] 19 26 51 53 63 82 92
$ : int [1:6] 11 45 50 58 87 88
$ : int [1:5] 11 44 75 81 88
$ : int [1:3] 8 37 66
$ : int [1:5] 26 63 77 78 83
$ : int [1:4] 7 28 89 94
$ : int [1:9] 2 13 18 30 55 60 80 97 99
$ : int [1:4] 44 57 87 88
$ : int [1:7] 35 43 64 82 92 96 98
$ : int [1:5] 16 25 31 54 67
$ : int [1:3] 19 43 63
$ : int [1:6] 25 31 40 52 74 96
$ : int [1:6] 12 18 23 36 49 60
$ : int [1:6] 6 11 12 61 81 100
$ : int [1:5] 20 22 38 50 87
$ : int [1:3] 11 44 100
$ : int [1:6] 7 8 33 42 74 94
$ : int [1:5] 13 36 49 55 90
$ : int [1:3] 6 56 100
$ : int [1:7] 4 29 63 76 77 80 84
$ : int [1:9] 19 26 43 47 53 62 76 77 83
$ : int [1:7] 33 35 42 51 92 93 98
$ : int [1:2] 10 71
$ : int [1:3] 8 42 46
$ : int [1:4] 16 31 52 71
$ : int [1:5] 1 17 19 32 73
$ : int [1:2] 7 25
$ : int [1:3] 15 37 72
$ : int [1:7] 9 10 24 31 65 67 82
$ : int [1:3] 21 37 70
$ : int [1:4] 17 32 39 68
$ : int [1:7] 7 25 33 40 54 59 98
$ : int [1:2] 45 81
$ : int [1:6] 1 19 29 41 62 63
$ : int [1:6] 4 47 62 63 83 84
$ : int [1:5] 9 24 26 47 83
$ : int [1:5] 1 17 34 41 85
$ : int [1:6] 13 29 30 49 62 84
$ : int [1:6] 11 12 23 45 56 75
$ : int [1:7] 9 26 31 43 51 71 96
$ : int [1:4] 47 63 77 78
$ : int [1:7] 4 13 29 62 77 80 90
$ : int [1:4] 34 41 79 86
$ : int [1:5] 3 34 85 97 99
$ : int [1:5] 20 38 44 50 57
$ : int [1:4] 11 44 45 50
$ : int [1:2] 48 94
$ : int [1:4] 4 13 60 84
$ : int [1:3] 35 39 93
$ : int [1:7] 19 32 35 39 43 51 64
$ : int [1:4] 35 42 64 91
$ : int [1:5] 7 8 48 59 89
$ : int [1:4] 5 6 14 97
$ : int [1:6] 31 40 51 54 82 98
$ : int [1:8] 2 3 5 14 49 86 95 99
$ : int [1:6] 33 40 51 64 74 96
$ : int [1:5] 30 34 49 86 97
$ : int [1:4] 11 56 58 61
- attr(*, "class")= chr "nb"
- attr(*, "region.id")= chr [1:100] "37001" "37003" "37005" "37007" ...
- attr(*, "GeoDa")=List of 2
..$ shpfile: chr "sids"
..$ ind : chr "rn"
- attr(*, "gal")= logi TRUE
- attr(*, "call")= logi TRUE
- attr(*, "sym")= logi TRUE
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79,
> SID79.lag=lag(wts, nc.sids at data$SID79))
NAME SID79 SID79.lag
1 Alamance 11 11.000000
2 Alexander 2 10.500000
3 Alleghany 3 4.333333
4 Anson 4 7.750000
5 Ashe 0 3.666667
6 Avery 0 6.400000
7 Beaufort 4 5.166667
8 Bertie 5 5.200000
9 Bladen 5 21.400000
10 Brunswick 6 9.666667
11 Buncombe 18 5.142857
12 Burke 15 10.142857
13 Cabarrus 20 12.800000
14 Caldwell 9 7.666667
15 Camden 2 2.666667
16 Carteret 4 14.333333
17 Caswell 2 12.800000
18 Catawba 21 9.833333
19 Chatham 3 12.875000
20 Cherokee 1 1.500000
21 Chowan 1 1.000000
22 Clay 0 2.000000
23 Cleveland 21 15.400000
24 Columbus 17 10.000000
25 Craven 18 6.000000
26 Cumberland 57 9.333333
27 Currituck 2 2.000000
28 Dare 1 0.000000
29 Davidson 8 13.428571
30 Davie 3 8.000000
31 Duplin 7 11.500000
32 Durham 22 9.600000
33 Edgecombe 9 9.800000
34 Forsyth 18 9.428571
35 Franklin 0 11.428571
36 Gaston 26 21.000000
37 Gates 2 2.400000
38 Graham 1 2.000000
39 Granville 4 12.600000
40 Greene 4 15.250000
41 Guilford 38 8.714286
42 Halifax 17 3.857143
43 Harnett 10 17.000000
44 Haywood 8 6.500000
45 Henderson 8 7.600000
46 Hertford 5 3.333333
47 Hoke 6 22.200000
48 Hyde 0 1.250000
49 Iredell 5 11.555556
50 Jackson 5 4.250000
51 Johnston 13 12.571429
52 Jones 2 13.200000
53 Lee 6 6.000000
54 Lenoir 14 10.833333
55 Lincoln 7 20.500000
56 McDowell 5 7.333333
57 Macon 3 1.800000
58 Madison 2 9.000000
59 Martin 1 7.666667
60 Mecklenburg 35 13.400000
61 Mitchell 2 2.000000
62 Montgomery 8 7.285714
63 Moore 5 13.888889
64 Nash 7 12.142857
65 New Hanover 9 4.500000
66 Northampton 3 9.000000
67 Onslow 23 4.000000
68 Orange 6 8.400000
69 Pamlico 1 11.000000
70 Pasquotank 4 1.333333
71 Pender 3 10.142857
72 Perquimans 0 2.333333
73 Person 4 8.500000
74 Pitt 11 9.000000
75 Polk 0 8.000000
76 Randolph 12 12.166667
77 Richmond 7 7.666667
78 Robeson 26 20.200000
79 Rockingham 5 14.800000
80 Rowan 8 8.500000
81 Rutherford 8 11.166667
82 Sampson 4 16.857143
83 Scotland 16 11.000000
84 Stanly 7 9.142857
85 Stokes 5 16.750000
86 Surry 6 6.800000
87 Swain 2 3.600000
88 Transylvania 4 9.750000
89 Tyrrell 0 0.000000
90 Union 9 16.500000
91 Vance 6 2.000000
92 Wake 31 8.428571
93 Warren 2 7.500000
94 Washington 0 2.000000
95 Watauga 1 4.000000
96 Wayne 23 9.166667
97 Wilkes 7 3.375000
98 Wilson 13 11.166667
99 Yadkin 1 7.800000
100 Yancey 1 6.750000
> Alleghany.before <- data.frame(NAME=nc.sids at data$NAME,
> SID79=nc.sids at data$SID79, SID79.lag=lag.listw(wts,
> nc.sids at data$SID79))[nc.sids at data$NAME=="Alleghany",] Alleghany.before
NAME SID79 SID79.lag
3 Alleghany 3 4.333333
>
> ##plot neighbors and county names
> plot(nc.sids, border="grey")
> plot(ncCR85_nb, coordinates(nc.sids), add=TRUE, col="blue")
> text(coordinates(nc.sids)+.1, labels=nc.sids at data$NAME, cex=.4)
> title(main="Contiguous neighbors")
>
> ##recode Wilkes County SID79 from 7 to NA for illustrating behavior of
> lagging with missing attribute nc.sids at data$SID79[97] <- NA
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79,
> SID79.lag=lag(wts, nc.sids at data$SID79))[97,]
NAME SID79 SID79.lag
97 Wilkes NA 3.375
> Alleghany.after <- data.frame(NAME=nc.sids at data$NAME,
> SID79=nc.sids at data$SID79, SID79.lag=lag.listw(wts,
> nc.sids at data$SID79))[nc.sids at data$NAME=="Alleghany",] Alleghany.before
NAME SID79 SID79.lag
3 Alleghany 3 4.333333
> Alleghany.after
NAME SID79 SID79.lag
3 Alleghany 3 2
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79,
> SID79.lag=lag(wts, nc.sids at data$SID79))[5,]
NAME SID79 SID79.lag
5 Ashe 0 1.333333
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79,
> SID79.lag=lag(wts, nc.sids at data$SID79))[86,]
NAME SID79 SID79.lag
86 Surry 6 5.4
>
> ##Note that if the spatial weights matrix excluded neighbors with
> missing attribute values, then the spatial lag would be 3 (i.e.,
> (0+6)/2), which is closer to true value of 4.3.
>
> ##########
More information about the R-sig-Geo
mailing list