[R-sig-Geo] R encounters fatal error in errorsarlm - sdpep package

Philine Gaffron p.gaffron at tuhh.de
Tue Oct 1 03:26:37 CEST 2013


Thank you for the Geographical Analysis reference, very helpful. I have 
deduced that the method argument should be "Matrix" for my data rather 
than "LU", since the "W" weights are symmetric. However, I am still 
getting the warnings on non convergence and since you state that this 
maybe a specification problem, I would like to find the error. I did 
eliminate all polygons without neighbours for all my datasets before 
calling errorsarlm, so this should not be the problem. I include a full 
code example inline below (including documentation of the intermediate 
objects and the R outputs) for one dataset with warning and one without 
below in case you have time to look at it and find a possible error 
source that I have not identified.

comments on crash also inline

Am 30.09.2013 12:31, schrieb Roger Bivand:
> On Mon, 30 Sep 2013, Philine Gaffron wrote:
>
>> Thank you for the response, Roger. My answers below. I have also 
>> added a question on a warning message I have received and would be 
>> glad to find out whether this means I need to change the analysis 
>> approach or just have to take it as additional information.
>
> Thanks for reporting back: some points for clarification inline.
>
>>
>> Am 25.09.2013 00:53, schrieb Roger Bivand:
>>> On Wed, 25 Sep 2013, Philine Gaffron wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to run a spatial regression analysis using the errorsarlm
>>>> function in the sdpep package. I first ran into memory problems (on a
>>>> larger dataset than the one described below) but have adjusted the
>>>> memory size limit:
>>>>
>>>>> memory.limit(size = 800000)
>>>> [1] 8e+05
>>>
>>> Which sets your memory limit under Windows to 781 Gb (size is in Mb) -
>>> that probably explains the error exit unless you have a lot of memory.
>>> Best avoid making any changes from the default in a fresh session
>>> unless you know exactly what you are doing.
>>
>> I did up the virtual memory of the machine to 800Gb first (it has 1Tb 
>> in total and there is currently nothing else running on it) so I hope 
>> that that was not the problem. Thanks for the hint, though.
>>
>>>
>>>>
>>>> However, when I try to execute the following code in R Studio, I
>>>> receive the message "R Session aborted - R encountered a fatal error
>>>> - The session was terminated"
>>>
>>> Always rerun any code with problems outside R Studio, because it can
>>> interfere with the reporting of warnings and errors. Certainly
>>> errorsarlm() has no reason to cause R to crash - and this is what you
>>> are suggesting. If you can reproduce this outside R Studio, and
>>> without setting memory limits, I may need to see your data to try to
>>> reproduce any error exit from the R session.
>>
>> I tried this and had no more crashes - especially after I made the 
>> changes suggested/outlined below
>
> Does this mean that the R error exit occurred when you used R Studio, 
> and did not occur when R was run directly, using exactly the same data 
> and code? If so, please also let the R Studio people know, so that 
> they can continue debugging.

I will try to reproduce this once I have understood / solved my 
non-convergence problem. If and when I do, I will send the code and data 
to the RStudio people and cc you (?).
>
>>
>>>
>>>>
>>>>> ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data =
>>>>> ElDorado_urban, listw=ElDoU_listw)
>>>
>>> Did you read the advice on ?errorsarlm about the use of alternative
>>> methods, such as "LU" or "Matrix" for larger data sets? You are using
>>> the default "eigen" method, but this should certainly have worked for
>>> the modest n=2081. Can you reproduce the problem with a built-in data
>>> set, such as elect80?
>>
>> I opted for the "LU" method and changed the weights style (see 
>> below). Is there any documentation that tells me the implications of 
>> using method ="LU"? I only found the note in the spdep documentation 
>> "LU_order default FALSE; used in “LU_prepermutate”, note warnings 
>> given for lu method" but nothing further. I did receive warnings in 
>> the output, though (for several but not all of my datasets) and I am 
>> not sure what they mean / what their implications are for the 
>> results. Do I need to choose another method? The warnings in every 
>> case were:
>>
>
> See Bivand, Hauke & Kossowski (2013) Geographical Analysis for some 
> details; LU_prepermutate emulates the implementation by finding an 
> initial fill-in re-ordering, but in R has little speed-up effect, 
> hence it is turned off by default.
>
>> 1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B,:
>> not converged within order iterations
>>
>> 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C,:
>> not converged within order iterations
>
> These are unusual, and may signal a mis-specified setting. The default 
> number of powers is 250, and lambda^250 is a very small number unless 
> lambda is close to its (probably upper) limit. The computation in fact 
> only affects the accuracy of the Hausman test on the difference 
> between OLS and spatial error coefficients, so may be adjusted if 
> lambda is close to limits by setting control argument pWOrder to a 
> larger value than 250, see ?powerWeights. As a symptom of lambda close 
> to its limits, it suggests mis-specification, possibly in terms of 
> observation support, with neighbouring properties sharing important 
> drivers. This may be suggested by your data scenario described below; 
> scrutinising the intermediate objects in your script may also put you 
> on the track of things behaving in unexpected ways. I would expect 
> many observations to have no neighbours with a 150m threshold, but you 
> don't show no-neighbour control being turned off.
I removed all polygons without neighbours prior to calling errorsarlm - 
see example below.
>
> Hope the helps,
>
> Roger
>
Here is the worked example of the two different datasets. No warning for 
the first one, warnings received for the second one included (sorry, 
this is a bit lengthy but I did not want to leave out anything that 
might be relevant).

####READING IN ArcGIS SHAPEFILE###############################

Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
Parcels2008_UNinc_POP <- readOGR(".", "Parcels_2008_UNinc_pop")


########EXTRACT SUBSETS OF Parcels2008 SpatialPolygonsDataframe####
ElDorado_rural <- subset(Parcels2008_UNinc_POP, COUNTYFP10 == "017")
ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")


##############TO CREATE A DISTANCE BASED NEIGHBOURS LIST IN 
R########################

# FIRST, calculate coords of feature centroids####

Eld_R_coords <- coordinates(ElDorado_rural)
Eld_U_coords <- coordinates(ElDorado_urban)


##SECOND, create vector with row names ####

Eld_R_IDs <- row.names(ElDorado_rural)
Eld_U_IDs <- row.names(ElDorado_urban)


###THIRD, creating the nb objects for calculating spatial weights####

##DATASET ONE: NO WARNING FROM errorsarlm (see below)####
ElDoR_nb <- dnearneigh(Eld_R_coords, d1 = 0, d2 = 984, row.names = 
Eld_R_IDs)
ElDoR_nb
# Neighbour list object:
# Number of regions: 26863
# Number of nonzero links: 1591926
# Percentage nonzero weights: 0.2206042
# Average number of links: 59.26092
# 34 regions with no links:
# 0 83 872 993 2301 2420 7518 12579 14649 17136 17148 18526 18601 18786 
20413
# 20427 20662 21445 21824 22497 23309 23421 23535 24366 24743 24784 
25007 25009
# 25107 25184 25203 25229 25421 25465

#remove parcels/regions with no links
ElDorado_rural <- subset(ElDorado_rural, subset=card(ElDoR_nb) > 0)

Eld_R_coords <- coordinates(ElDorado_rural)
summary(Eld_R_coords)
# V1 V2
# Min. :6816265 Min. :1953502
# 1st Qu.:6845579 1st Qu.:2007471
# Median :6876165 Median :2016977
# Mean :6884229 Mean :2021703
# 3rd Qu.:6918361 3rd Qu.:2034467
# Max. :7091505 Max. :2108970

Eld_R_IDs <- row.names(ElDorado_rural)
summary(Eld_R_IDs)
# Length Class Mode
# 26829 character character
ElDoR_nb <- dnearneigh(Eld_R_coords, d1 = 0, d2 = 984, row.names = 
Eld_R_IDs)
ElDoR_nb
# Neighbour list object:
# Number of regions: 26829
# Number of nonzero links: 1591926
# Percentage nonzero weights: 0.2211637
# Average number of links: 59.33602

is.symmetric.nb(ElDoR_nb)
# [1] TRUE


##DATASET TWO:NON-CONVERGENCE WARNING FROM errorsarlm (see below)########
ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names = 
Eld_U_IDs)
ElDoU_nb
# Neighbour list object:
# Number of regions: 2094
# Number of nonzero links: 57172
# Percentage nonzero weights: 1.303857
# Average number of links: 27.30277
# 13 regions with no links:
# 425 468 628 929 1216 1218 1224 1252 1363 1416 1927 2007 2088

#remove parcels/regions with no links
ElDorado_urban <- subset(ElDorado_urban, subset=card(ElDoU_nb) > 0)
Eld_U_coords <- coordinates(ElDorado_urban)
summary(Eld_U_coords)
# V1 V2
# Min. :6892551 Min. :2024721
# 1st Qu.:6901757 1st Qu.:2028604
# Median :6904635 Median :2030283
# Mean :6905218 Mean :2030021
# 3rd Qu.:6908098 3rd Qu.:2031299
# Max. :6916815 Max. :2035333

Eld_U_IDs <- row.names(ElDorado_urban)
summary(Eld_U_IDs)
# Length Class Mode
# 2081 character characte

ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names = 
Eld_U_IDs)
ElDoU_nb
# Neighbour list object:
# Number of regions: 2081
# Number of nonzero links: 57172
# Percentage nonzero weights: 1.320198
# Average number of links: 27.47333
is.symmetric.nb(ElDoU_nb)
# [1] TRUE



####WEIGHTS######################################

#ElDorado rural
ElDoR_listw <- nb2listw(ElDoR_nb, glist=NULL, style="W", zero.policy=NULL)
names(ElDoR_listw)
# [1] "style" "neighbours" "weights"
summary(ElDoR_listw)
# Characteristics of weights list object:
# Neighbour list object:
# Number of regions: 26829
# Number of nonzero links: 1591926
# Percentage nonzero weights: 0.2211637
# Average number of links: 59.33602
# Link number distribution:
#
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# 85 160 267 454 692 870 1007 1016 940 802 751 583 541 457 377 347 301 
261 209 221 234 193
# 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
# 191 207 216 183 185 185 161 184 175 209 189 163 178 155 164 138 138 
151 134 135 116 125
# 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
# 96 97 106 101 107 116 100 122 110 103 125 103 105 105 107 112 77 101 
95 95 102 102
# 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
# 69 102 110 87 90 96 135 94 107 90 111 98 99 96 135 125 98 118 110 123 
94 76
# 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 
109 110
# 101 113 95 105 84 99 89 102 94 98 114 107 91 104 116 90 98 97 110 93 
103 97
# 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 
128 129 130 131 132
# 107 98 114 111 93 100 77 103 101 95 109 96 79 103 99 80 78 86 83 93 84 80
# 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 
150 151 152 153 154
# 72 90 91 93 82 81 82 71 77 70 84 67 57 74 69 87 74 74 77 52 76 70
# 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 
172 173 174 175 176
# 49 60 52 42 62 57 56 47 56 45 59 58 53 53 61 47 32 26 39 37 35 39
# 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 
194 195 196 197 198
# 30 28 32 17 26 22 16 14 16 18 7 13 9 12 15 13 18 6 7 18 12 11
# 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 
216 217 218 219 220
# 13 13 13 16 13 10 17 12 12 11 8 9 1 4 8 6 2 7 6 8 5 3
# 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 
238 239 240 241 242
# 8 7 4 7 3 10 1 2 4 2 6 9 1 6 6 9 7 4 7 6 1 5
# 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 
261 262 263 264 265
# 4 2 5 3 5 4 3 2 5 4 1 2 5 2 4 5 4 2 3 4 2 3
# 266 267 268 269 270 272 273 274 275 276 278 279 280 282 283 284 285 
286 287 289 290 292
# 2 2 2 4 6 2 1 3 1 1 4 1 2 6 4 8 6 5 2 3 1 3
# 293 295 297
# 7 2 1
# 85 least connected regions:
# 37 261 262 263 268 334 585 1011 1012 1035 2419 2546 2621 2659 3258 
4544 4557 5501 5502 5859 10005 10006 10007 12577 12578 16301 17256 17257 
18697 18788 19339 19445 20499 20658 21104 21105 21440 21441 21494 21829 
21981 21982 22039 22056 22154 22208 22310 22315 22316 22396 22397 22398 
22423 22816 22823 22989 22990 23121 23133 23554 23621 23771 23772 23823 
23891 23903 24313 24348 24437 24464 24676 24802 24803 24886 25008 25306 
25354 25466 25634 26060 26432 26433 26452 26453 26509 with 1 link
# 1 most connected region:
# 9783 with 297 links
#
# Weights style: W
# Weights constants summary:
# n nn S0 S1 S2
# W 26829 719795241 26829 3194.683 108332.1
summary(sapply(ElDoR_listw$weights, sum))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1 1 1 1 1 1



#El Dorado urban
ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="W", zero.policy=NULL)
summary(ElDoU_listw)
# Characteristics of weights list object:
# Neighbour list object:
# Number of regions: 2081
# Number of nonzero links: 57172
# Percentage nonzero weights: 1.320198
# Average number of links: 27.47333
# Link number distribution:
#
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
27 28 29 30 31 32 33 34 35 36 37
# 14 15 29 29 21 27 34 36 35 32 43 58 46 44 49 53 60 40 56 42 56 49 50 
58 54 46 48 40 49 45 45 38 31 38 37 26 29
# 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
61 62 63 64 65 66
# 32 37 33 38 32 32 33 31 34 34 43 28 26 15 17 17 18 16 14 5 14 9 3 4 5 
3 2 3 1
# 14 least connected regions:
# 147 148 159 378 466 467 596 629 642 1217 1705 1710 2074 2087 with 1 link
# 1 most connected region:
# 865 with 66 links
#
# Weights style: W
# Weights constants summary:
# n nn S0 S1 S2
# W 2081 4330561 2081 260.5586 8405.665
summary(sapply(ElDoU_listw$weights, sum))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1 1 1 1 1 1




####SPATIAL REGRESSION ANALYSIS#####


##El Dorado rural################################
ElDR.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data = 
ElDorado_rural, listw=ElDoR_listw, method = "Matrix")
ElDR.PM25.white <- summary(ElDR.slm.PM25.white, Nagelkerke = TRUE, 
digits = 4, signif.stars = TRUE)
ElDR.PM25.white
# Call:errorsarlm(formula = PM_PM25 ~ race_white, data = ElDorado_rural,
# listw = ElDoR_listw, method = "Matrix")
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.37948413 -0.00236403 -0.00096381 0.00088396 0.69478272
#
# Type: error
# Coefficients: (asymptotic standard errors)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 3.0372e-02 3.0068e-03 10.1013 < 2.2e-16
# race_white -5.7658e-05 1.4887e-05 -3.8731 0.0001074
#
# Lambda: 0.94999, LR test value: 32485, p-value: < 2.22e-16
# Approximate (numerical Hessian) standard error: 0.0022514
# z-value: 421.96, p-value: < 2.22e-16
# Wald statistic: 178050, p-value: < 2.22e-16
#
# Log likelihood: 62497.7 for error model
# ML residual variance (sigma squared): 0.00049093, (sigma: 0.022157)
# Nagelkerke pseudo-R-squared: 0.70418
# Number of observations: 26829
# Number of parameters estimated: 4
# AIC: -124990, (AIC for lm: -92505)



##El Dorado urban####################################
ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data = 
ElDorado_urban, listw=ElDoU_listw, method = "Matrix")
# Warning messages:
# 1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B, :
# not converged within order iterations
# 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C, :
# not converged within order iterations
ElDU.PM25.white <- summary(ElDU.slm.PM25.white, Nagelkerke = TRUE, 
digits = 4, signif.stars = TRUE)
ElDU.PM25.white
# Call:errorsarlm(formula = PM_PM25 ~ race_white, data = ElDorado_urban,
# listw = ElDoU_listw, method = "Matrix")
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.1277026 -0.0080154 -0.0023227 0.0049338 0.3288472
#
# Type: error
# Coefficients: (asymptotic standard errors)
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 5.3598e-02 1.5882e-02 3.3747 0.0007389
# race_white -2.8711e-05 8.8431e-05 -0.3247 0.7454294
#
# Lambda: 0.95004, LR test value: 2875.6, p-value: < 2.22e-16
# Approximate (numerical Hessian) standard error: 0.0070736
# z-value: 134.31, p-value: < 2.22e-16
# Wald statistic: 18039, p-value: < 2.22e-16
#
# Log likelihood: 4093.031 for error model
# ML residual variance (sigma squared): 0.00099886, (sigma: 0.031605)
# Nagelkerke pseudo-R-squared: 0.74927
# Number of observations: 2081
# Number of parameters estimated: 4
# AIC: -8178.1, (AIC for lm: -5304.5)



###Session Info######
# R Studio version 0.97.551

sessionInfo()
# R version 3.0.1 (2013-05-16)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
#
# locale:
# [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United 
States.1252
# [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13
#
# loaded via a namespace (and not attached):
# [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12 
MASS_7.3-26
# [7] nlme_3.1-109 splines_3.0.1 tools_3.0.1

>>
>>
>> I have not tried reproducing the error with the built-in dataset but 
>> if that is of interest to you from the point of view of spdep 
>> development, I could give it a try.
>>
>>>
>>>>
>>>> The data was imported from an ArcGIS shape file:
>>>>> Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
>>>>> ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")
>>>>
>>>> The subset contains data from 2081 polygons
>>>>
>>>> The listw object was created as follows:
>>>>
>>>>> Eld_U_coords <- coordinates(ElDorado_urban)
>>>>> Eld_U_IDs <- row.names(ElDorado_urban)
>>>>> ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names =
>>>>> Eld_U_IDs)
>>>
>>> Is 492 a well-chosen upper distance threshold? I assume that the input
>>> data are projected, so that these are measures in the units of the
>>> projection - metres? If Eld_U_coords are actually geographical
>>> coordinates, but longlat=NULL, d2 is taken as a planar measure, if you
>>> declare longlat=TRUE, it is in km. There is no easy way to trap
>>> undeclared geographical coordinates.
>>
>> The distance thresholds are in feet 
>> (NAD_1983_StatePlane_California_II_FIPS_0402_Feet) and are related to 
>> dispersion distances for road traffic emissions. For explanation: my 
>> polygons are land use parcels for which I have calculated emission 
>> loads in grams/area from traffic loads on a road network. Since I am 
>> assuming a dispersion distance of 492 feet / 150 m of emissions away 
>> from the network in the example above, I am further assuming, that 
>> parcels which are within that distance of one another are affected by 
>> the same emission sources (i.e. traffic loads on the network).
>>
>>>
>>>>> ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="S",
>>>>> zero.policy=NULL)
>>>>
>>>
>>> Does choosing a different style make a difference ("S" is not often
>>> used)?
>>
>> I chose "S" as a variance stabilising coding schemes following 
>> Bivand, Pebesma et al. (2008), p. 253 but have now switched back to 
>> "W". Apart from the warnings cited above I encountered no more issues.
>>
>> Thank you for your help and for making this package available.
>>
>> Philine
>>
>>>
>>> Roger
>>>
>>>> Is there anything inherent in my approach that could cause this
>>>> problem or would one need to look at the data? I include the session
>>>> info below.
>>>>
>>>> Thanks for any help,
>>>>
>>>> Philine
>>>>
>>>>> sessionInfo()
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12
>>>> MASS_7.3-26 nlme_3.1-109 splines_3.0.1 tools_3.0.1
>>>>
>>>>> library(help=spdep)
>>>> Information on package ?spdep?
>>>>
>>>> Description:
>>>>
>>>> Package: spdep
>>>> Version: 0.5-65
>>>> Date: 2013-09-04
>>>>
>>>>
>>>
>>
>>
>



More information about the R-sig-Geo mailing list