[R-sig-Geo] Spatial Lags Excluding Neighbors' Missing Attribute Values

Sat Dec 27 22:45:32 CET 2008

Greetings,

I am seeking help with creating spatial lags that exclude neighbors with 
missing attribute values. I searched the spdep help documentation and the R 
Sig Geo Archives but was unable to find relevant information. (Please note 
that I posted an R Sig Geo search engine here: 
<http://mycroft.mozdev.org/search-engines.html?name=r+sig+geo>.)

As shown in the example below, the lag.listw function converts missing 
attribute values to zero before calculating the mean of neighbhors' 
attribute values (i.e., the spatial lag).

Do you have any advice for calculating spatial lags so that neighbors with 
missing attribute values are excluded (rather than treated as zero)? 
Alternatively, how might I remove neighbor links when the neighbor is 
missing an attribute value? I would prefer the former approach (exclusion 
from the calculation) because it wouldn't require creating a new neighbor 
list each time a new attribute is analyzed. However, the latter approach 
(editing the neighbor list) seems easier, if only I could get the list into 
a data frame-like object for editing, which I can not figure out how to do.

I would greatly appreciate your help and have provided a reproducible 
example below (see the very bottom of the of the example in particular). 
Please let me know if you need further clarification.

Regards,
Chris

-- 
Christopher Moore, M.P.P.
Doctoral Student
Quantitative Methods in Education
University of Minnesota
moor0554 at umn.edu
http://www.tc.umn.edu/~moor0554/

> ##########
> 
> library(spdep)
> 
> ##process and summarize nc.sids data nc.sids <- 
> readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1], 
> ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66")) rn <- 
> sapply(slot(nc.sids, "polygons"), function(x) slot(x, "ID")) ncCR85_nb <- 
> read.gal(system.file("etc/weights/ncCR85.gal", package="spdep")[1], 
> region.id=rn) summary(ncCR85_nb, coordinates(nc.sids))
Neighbour list object:
Number of regions: 100 
Number of nonzero links: 492 
Percentage nonzero weights: 4.92 
Average number of links: 4.92 
Link number distribution:

 1  2  3  4  5  6  7  8  9 
 2  6 15 16 23 18 16  2  2 
2 least connected regions:
37053 37055 with 1 link
2 most connected regions:
37097 37125 with 9 links
Summary of link distances:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1197  0.3407  0.4018  0.4170  0.4922  0.7843 

  The decimal point is 1 digit(s) to the left of the |

  1 | 22
  1 | 778888888899
  2 | 4444
  2 | 5555555566666666667777778888888888888899999999
  3 | 00000000000000111111111111111111112222222222223333333344444444
  3 | 
55555555555555555555555555555555666666666666666666666677777777777777+26
  4 | 00000000000000000011111111111111111111112222222222222233333333333344
  4 | 55555555555555555566666666667777777777777777888888889999999999
  5 | 00000000111111111122222222222222222233333333334444
  5 | 5555555555556666666666667788888899
  6 | 00111122334444
  6 | 5555557788999999
  7 | 000044
  7 | 88

> wts <- nb2listw(ncCR85_nb)
> str(wts$neighbours)
List of 100
 $ : int [1:6] 17 19 41 68 76 79
 $ : int [1:4] 14 18 49 97
 $ : int [1:3] 5 86 97
 $ : int [1:4] 62 77 84 90
 $ : int [1:3] 3 95 97
 $ : int [1:5] 12 14 56 61 95
 $ : int [1:6] 25 48 59 69 74 94
 $ : int [1:5] 42 46 59 66 94
 $ : int [1:5] 24 26 71 78 82
 $ : int [1:3] 24 65 71
 $ : int [1:7] 44 45 56 58 81 88 100
 $ : int [1:7] 6 14 18 23 55 56 81
 $ : int [1:5] 49 60 80 84 90
 $ : int [1:6] 2 6 12 18 95 97
 $ : int [1:3] 27 37 70
 $ : int [1:3] 25 52 67
 $ : int [1:5] 1 41 68 73 79
 $ : int [1:6] 2 12 14 23 49 55
 $ : int [1:8] 1 32 43 53 63 68 76 92
 $ : int [1:4] 22 38 57 87
 $ : int [1:2] 37 72
 $ : int [1:2] 20 57
 $ : int [1:5] 12 18 36 55 81
 $ : int [1:4] 9 10 71 78
 $ : int [1:6] 7 16 52 54 69 74
 $ : int [1:6] 9 43 47 63 78 82
 $ : int 15
 $ : int 48
 $ : int [1:7] 30 34 41 62 76 80 84
 $ : int [1:5] 29 34 49 80 99
 $ : int [1:6] 52 54 67 71 82 96
 $ : int [1:5] 19 39 68 73 92
 $ : int [1:5] 42 59 64 74 98
 $ : int [1:7] 29 30 41 79 85 86 99
 $ : int [1:7] 39 42 51 64 91 92 93
 $ : int [1:3] 23 55 60
 $ : int [1:5] 15 21 46 70 72
 $ : int [1:3] 20 57 87
 $ : int [1:5] 32 35 73 91 92
 $ : int [1:4] 54 74 96 98
 $ : int [1:7] 1 17 29 34 76 79 85
 $ : int [1:7] 8 33 35 59 64 66 93
 $ : int [1:7] 19 26 51 53 63 82 92
 $ : int [1:6] 11 45 50 58 87 88
 $ : int [1:5] 11 44 75 81 88
 $ : int [1:3] 8 37 66
 $ : int [1:5] 26 63 77 78 83
 $ : int [1:4] 7 28 89 94
 $ : int [1:9] 2 13 18 30 55 60 80 97 99
 $ : int [1:4] 44 57 87 88
 $ : int [1:7] 35 43 64 82 92 96 98
 $ : int [1:5] 16 25 31 54 67
 $ : int [1:3] 19 43 63
 $ : int [1:6] 25 31 40 52 74 96
 $ : int [1:6] 12 18 23 36 49 60
 $ : int [1:6] 6 11 12 61 81 100
 $ : int [1:5] 20 22 38 50 87
 $ : int [1:3] 11 44 100
 $ : int [1:6] 7 8 33 42 74 94
 $ : int [1:5] 13 36 49 55 90
 $ : int [1:3] 6 56 100
 $ : int [1:7] 4 29 63 76 77 80 84
 $ : int [1:9] 19 26 43 47 53 62 76 77 83
 $ : int [1:7] 33 35 42 51 92 93 98
 $ : int [1:2] 10 71
 $ : int [1:3] 8 42 46
 $ : int [1:4] 16 31 52 71
 $ : int [1:5] 1 17 19 32 73
 $ : int [1:2] 7 25
 $ : int [1:3] 15 37 72
 $ : int [1:7] 9 10 24 31 65 67 82
 $ : int [1:3] 21 37 70
 $ : int [1:4] 17 32 39 68
 $ : int [1:7] 7 25 33 40 54 59 98
 $ : int [1:2] 45 81
 $ : int [1:6] 1 19 29 41 62 63
 $ : int [1:6] 4 47 62 63 83 84
 $ : int [1:5] 9 24 26 47 83
 $ : int [1:5] 1 17 34 41 85
 $ : int [1:6] 13 29 30 49 62 84
 $ : int [1:6] 11 12 23 45 56 75
 $ : int [1:7] 9 26 31 43 51 71 96
 $ : int [1:4] 47 63 77 78
 $ : int [1:7] 4 13 29 62 77 80 90
 $ : int [1:4] 34 41 79 86
 $ : int [1:5] 3 34 85 97 99
 $ : int [1:5] 20 38 44 50 57
 $ : int [1:4] 11 44 45 50
 $ : int [1:2] 48 94
 $ : int [1:4] 4 13 60 84
 $ : int [1:3] 35 39 93
 $ : int [1:7] 19 32 35 39 43 51 64
 $ : int [1:4] 35 42 64 91
 $ : int [1:5] 7 8 48 59 89
 $ : int [1:4] 5 6 14 97
 $ : int [1:6] 31 40 51 54 82 98
 $ : int [1:8] 2 3 5 14 49 86 95 99
 $ : int [1:6] 33 40 51 64 74 96
 $ : int [1:5] 30 34 49 86 97
 $ : int [1:4] 11 56 58 61
 - attr(*, "class")= chr "nb"
 - attr(*, "region.id")= chr [1:100] "37001" "37003" "37005" "37007" ...
 - attr(*, "GeoDa")=List of 2
  ..$ shpfile: chr "sids"
  ..$ ind    : chr "rn"
 - attr(*, "gal")= logi TRUE
 - attr(*, "call")= logi TRUE
 - attr(*, "sym")= logi TRUE
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79, 
> SID79.lag=lag(wts, nc.sids at data$SID79))
            NAME SID79 SID79.lag
1       Alamance    11 11.000000
2      Alexander     2 10.500000
3      Alleghany     3  4.333333
4          Anson     4  7.750000
5           Ashe     0  3.666667
6          Avery     0  6.400000
7       Beaufort     4  5.166667
8         Bertie     5  5.200000
9         Bladen     5 21.400000
10     Brunswick     6  9.666667
11      Buncombe    18  5.142857
12         Burke    15 10.142857
13      Cabarrus    20 12.800000
14      Caldwell     9  7.666667
15        Camden     2  2.666667
16      Carteret     4 14.333333
17       Caswell     2 12.800000
18       Catawba    21  9.833333
19       Chatham     3 12.875000
20      Cherokee     1  1.500000
21        Chowan     1  1.000000
22          Clay     0  2.000000
23     Cleveland    21 15.400000
24      Columbus    17 10.000000
25        Craven    18  6.000000
26    Cumberland    57  9.333333
27     Currituck     2  2.000000
28          Dare     1  0.000000
29      Davidson     8 13.428571
30         Davie     3  8.000000
31        Duplin     7 11.500000
32        Durham    22  9.600000
33     Edgecombe     9  9.800000
34       Forsyth    18  9.428571
35      Franklin     0 11.428571
36        Gaston    26 21.000000
37         Gates     2  2.400000
38        Graham     1  2.000000
39     Granville     4 12.600000
40        Greene     4 15.250000
41      Guilford    38  8.714286
42       Halifax    17  3.857143
43       Harnett    10 17.000000
44       Haywood     8  6.500000
45     Henderson     8  7.600000
46      Hertford     5  3.333333
47          Hoke     6 22.200000
48          Hyde     0  1.250000
49       Iredell     5 11.555556
50       Jackson     5  4.250000
51      Johnston    13 12.571429
52         Jones     2 13.200000
53           Lee     6  6.000000
54        Lenoir    14 10.833333
55       Lincoln     7 20.500000
56      McDowell     5  7.333333
57         Macon     3  1.800000
58       Madison     2  9.000000
59        Martin     1  7.666667
60   Mecklenburg    35 13.400000
61      Mitchell     2  2.000000
62    Montgomery     8  7.285714
63         Moore     5 13.888889
64          Nash     7 12.142857
65   New Hanover     9  4.500000
66   Northampton     3  9.000000
67        Onslow    23  4.000000
68        Orange     6  8.400000
69       Pamlico     1 11.000000
70    Pasquotank     4  1.333333
71        Pender     3 10.142857
72    Perquimans     0  2.333333
73        Person     4  8.500000
74          Pitt    11  9.000000
75          Polk     0  8.000000
76      Randolph    12 12.166667
77      Richmond     7  7.666667
78       Robeson    26 20.200000
79    Rockingham     5 14.800000
80         Rowan     8  8.500000
81    Rutherford     8 11.166667
82       Sampson     4 16.857143
83      Scotland    16 11.000000
84        Stanly     7  9.142857
85        Stokes     5 16.750000
86         Surry     6  6.800000
87         Swain     2  3.600000
88  Transylvania     4  9.750000
89       Tyrrell     0  0.000000
90         Union     9 16.500000
91         Vance     6  2.000000
92          Wake    31  8.428571
93        Warren     2  7.500000
94    Washington     0  2.000000
95       Watauga     1  4.000000
96         Wayne    23  9.166667
97        Wilkes     7  3.375000
98        Wilson    13 11.166667
99        Yadkin     1  7.800000
100       Yancey     1  6.750000
> Alleghany.before <- data.frame(NAME=nc.sids at data$NAME, 
> SID79=nc.sids at data$SID79, SID79.lag=lag.listw(wts, 
> nc.sids at data$SID79))[nc.sids at data$NAME=="Alleghany",] Alleghany.before
       NAME SID79 SID79.lag
3 Alleghany     3  4.333333
> 
> ##plot neighbors and county names
> plot(nc.sids, border="grey")
> plot(ncCR85_nb, coordinates(nc.sids), add=TRUE, col="blue")
> text(coordinates(nc.sids)+.1, labels=nc.sids at data$NAME, cex=.4)
> title(main="Contiguous neighbors")
> 
> ##recode Wilkes County SID79 from 7 to NA for illustrating behavior of 
> lagging with missing attribute nc.sids at data$SID79[97] <- NA 
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79, 
> SID79.lag=lag(wts, nc.sids at data$SID79))[97,]
     NAME SID79 SID79.lag
97 Wilkes    NA     3.375
> Alleghany.after <- data.frame(NAME=nc.sids at data$NAME, 
> SID79=nc.sids at data$SID79, SID79.lag=lag.listw(wts, 
> nc.sids at data$SID79))[nc.sids at data$NAME=="Alleghany",] Alleghany.before
       NAME SID79 SID79.lag
3 Alleghany     3  4.333333
> Alleghany.after
       NAME SID79 SID79.lag
3 Alleghany     3         2
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79, 
> SID79.lag=lag(wts, nc.sids at data$SID79))[5,]
  NAME SID79 SID79.lag
5 Ashe     0  1.333333
> data.frame(NAME=nc.sids at data$NAME, SID79=nc.sids at data$SID79, 
> SID79.lag=lag(wts, nc.sids at data$SID79))[86,]
    NAME SID79 SID79.lag
86 Surry     6       5.4
> 
> ##Note that if the spatial weights matrix excluded neighbors with 
> missing attribute values, then the spatial lag would be 3 (i.e., 
> (0+6)/2), which is closer to true value of 4.3.
> 
> ##########