[R-sig-Geo] R encounters fatal error in errorsarlm - sdpep package

Roger Bivand Roger.Bivand at nhh.no
Tue Oct 1 20:58:33 CEST 2013


On Tue, 1 Oct 2013, Roger Bivand wrote:

> On Tue, 1 Oct 2013, Philine Gaffron wrote:
>
>> Thank you for the Geographical Analysis reference, very helpful. I have 
>> deduced that the method argument should be "Matrix" for my data rather than 
>> "LU", since the "W" weights are symmetric. However, I am still getting the 
>> warnings on non convergence and since you state that this maybe a 
>> specification problem, I would like to find the error.
>
> I think that, unless you need the Hausman test from errorsarlm, just turn it 
> off with errorsarlm(..., control=list(returnHcov=FALSE)); the ML fit is quite 
> OK, it is only the Hausman test component that is giving the warnings for the 
> spatial coefficient close to its limits, and when using other methods than 
> "eigen". If you do need it and cannot use method="eigen", increase the order 
> until the warning (it is a warning, not an error) goes away with 
> errorsarlm(..., control=list(pWOrder=<big integer>)). I have had win-builder 
> roll up a Windows binary of the development version with more details on the 
> problem, on:
>
> http://win-builder.r-project.org/5IVvaH0M3Qsj

Use rather:

http://win-builder.r-project.org/21E63UI6JzHH

as the first attempt was not completely though through.

Roger

>
> See the examples for powerWeight, and look at the returned component for the 
> warnining-displaying errorsarlm fit called $pWinternal showing how 
> convergence failed; series needs to fall below tol for convergence.
>
> As your n is not large, you can compare the "eigen" and "Matrix" methods to 
> see how the Hausman test output diverges as the approximation to the term 
> needed deteriorates.
>
> The underlying problem that you have touched is (as in the sections in Bivand 
> et al. 2013 on approximations) that using power series to generate the 
> product of the inverse of the nxn (I - \rho W) matrix and an arbitrary vector 
> also deteriotates when \rho is near its limits.
>
> Thanks for the detailed results; I've deleted them in this reply to save 
> space.
>
> Roger
>
>> I did eliminate all polygons without neighbours for all my datasets before 
>> calling errorsarlm, so this should not be the problem. I include a full 
>> code example inline below (including documentation of the intermediate 
>> objects and the R outputs) for one dataset with warning and one without 
>> below in case you have time to look at it and find a possible error source 
>> that I have not identified.
>> 
>> comments on crash also inline
>> 
>> Am 30.09.2013 12:31, schrieb Roger Bivand:
>>> On Mon, 30 Sep 2013, Philine Gaffron wrote:
>>> 
>>>> Thank you for the response, Roger. My answers below. I have also added a 
>>>> question on a warning message I have received and would be glad to find 
>>>> out whether this means I need to change the analysis approach or just 
>>>> have to take it as additional information.
>>> 
>>> Thanks for reporting back: some points for clarification inline.
>>> 
>>>> 
>>>> Am 25.09.2013 00:53, schrieb Roger Bivand:
>>>>> On Wed, 25 Sep 2013, Philine Gaffron wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I am trying to run a spatial regression analysis using the errorsarlm
>>>>>> function in the sdpep package. I first ran into memory problems (on a
>>>>>> larger dataset than the one described below) but have adjusted the
>>>>>> memory size limit:
>>>>>> 
>>>>>>> memory.limit(size = 800000)
>>>>>> [1] 8e+05
>>>>> 
>>>>> Which sets your memory limit under Windows to 781 Gb (size is in Mb) -
>>>>> that probably explains the error exit unless you have a lot of memory.
>>>>> Best avoid making any changes from the default in a fresh session
>>>>> unless you know exactly what you are doing.
>>>> 
>>>> I did up the virtual memory of the machine to 800Gb first (it has 1Tb in 
>>>> total and there is currently nothing else running on it) so I hope that 
>>>> that was not the problem. Thanks for the hint, though.
>>>> 
>>>>> 
>>>>>> 
>>>>>> However, when I try to execute the following code in R Studio, I
>>>>>> receive the message "R Session aborted - R encountered a fatal error
>>>>>> - The session was terminated"
>>>>> 
>>>>> Always rerun any code with problems outside R Studio, because it can
>>>>> interfere with the reporting of warnings and errors. Certainly
>>>>> errorsarlm() has no reason to cause R to crash - and this is what you
>>>>> are suggesting. If you can reproduce this outside R Studio, and
>>>>> without setting memory limits, I may need to see your data to try to
>>>>> reproduce any error exit from the R session.
>>>> 
>>>> I tried this and had no more crashes - especially after I made the 
>>>> changes suggested/outlined below
>>> 
>>> Does this mean that the R error exit occurred when you used R Studio, and 
>>> did not occur when R was run directly, using exactly the same data and 
>>> code? If so, please also let the R Studio people know, so that they can 
>>> continue debugging.
>> 
>> I will try to reproduce this once I have understood / solved my 
>> non-convergence problem. If and when I do, I will send the code and data to 
>> the RStudio people and cc you (?).
>>> 
>>>> 
>>>>> 
>>>>>> 
>>>>>>> ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data =
>>>>>>> ElDorado_urban, listw=ElDoU_listw)
>>>>> 
>>>>> Did you read the advice on ?errorsarlm about the use of alternative
>>>>> methods, such as "LU" or "Matrix" for larger data sets? You are using
>>>>> the default "eigen" method, but this should certainly have worked for
>>>>> the modest n=2081. Can you reproduce the problem with a built-in data
>>>>> set, such as elect80?
>>>> 
>>>> I opted for the "LU" method and changed the weights style (see below). Is 
>>>> there any documentation that tells me the implications of using method 
>>>> ="LU"? I only found the note in the spdep documentation "LU_order default 
>>>> FALSE; used in “LU_prepermutate”, note warnings given for lu method" but 
>>>> nothing further. I did receive warnings in the output, though (for 
>>>> several but not all of my datasets) and I am not sure what they mean / 
>>>> what their implications are for the results. Do I need to choose another 
>>>> method? The warnings in every case were:
>>>> 
>>> 
>>> See Bivand, Hauke & Kossowski (2013) Geographical Analysis for some 
>>> details; LU_prepermutate emulates the implementation by finding an initial 
>>> fill-in re-ordering, but in R has little speed-up effect, hence it is 
>>> turned off by default.
>>> 
>>>> 1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B,:
>>>> not converged within order iterations
>>>> 
>>>> 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C,:
>>>> not converged within order iterations
>>> 
>>> These are unusual, and may signal a mis-specified setting. The default 
>>> number of powers is 250, and lambda^250 is a very small number unless 
>>> lambda is close to its (probably upper) limit. The computation in fact 
>>> only affects the accuracy of the Hausman test on the difference between 
>>> OLS and spatial error coefficients, so may be adjusted if lambda is close 
>>> to limits by setting control argument pWOrder to a larger value than 250, 
>>> see ?powerWeights. As a symptom of lambda close to its limits, it suggests 
>>> mis-specification, possibly in terms of observation support, with 
>>> neighbouring properties sharing important drivers. This may be suggested 
>>> by your data scenario described below; scrutinising the intermediate 
>>> objects in your script may also put you on the track of things behaving in 
>>> unexpected ways. I would expect many observations to have no neighbours 
>>> with a 150m threshold, but you don't show no-neighbour control being 
>>> turned off.
>> I removed all polygons without neighbours prior to calling errorsarlm - see 
>> example below.
>>> 
>>> Hope the helps,
>>> 
>>> Roger
>>> 
>>>> 
>>>> I have not tried reproducing the error with the built-in dataset but if 
>>>> that is of interest to you from the point of view of spdep development, I 
>>>> could give it a try.
>>>> 
>>>>> 
>>>>>> 
>>>>>> The data was imported from an ArcGIS shape file:
>>>>>>> Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
>>>>>>> ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")
>>>>>> 
>>>>>> The subset contains data from 2081 polygons
>>>>>> 
>>>>>> The listw object was created as follows:
>>>>>> 
>>>>>>> Eld_U_coords <- coordinates(ElDorado_urban)
>>>>>>> Eld_U_IDs <- row.names(ElDorado_urban)
>>>>>>> ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names =
>>>>>>> Eld_U_IDs)
>>>>> 
>>>>> Is 492 a well-chosen upper distance threshold? I assume that the input
>>>>> data are projected, so that these are measures in the units of the
>>>>> projection - metres? If Eld_U_coords are actually geographical
>>>>> coordinates, but longlat=NULL, d2 is taken as a planar measure, if you
>>>>> declare longlat=TRUE, it is in km. There is no easy way to trap
>>>>> undeclared geographical coordinates.
>>>> 
>>>> The distance thresholds are in feet 
>>>> (NAD_1983_StatePlane_California_II_FIPS_0402_Feet) and are related to 
>>>> dispersion distances for road traffic emissions. For explanation: my 
>>>> polygons are land use parcels for which I have calculated emission loads 
>>>> in grams/area from traffic loads on a road network. Since I am assuming a 
>>>> dispersion distance of 492 feet / 150 m of emissions away from the 
>>>> network in the example above, I am further assuming, that parcels which 
>>>> are within that distance of one another are affected by the same emission 
>>>> sources (i.e. traffic loads on the network).
>>>> 
>>>>> 
>>>>>>> ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="S",
>>>>>>> zero.policy=NULL)
>>>>>> 
>>>>> 
>>>>> Does choosing a different style make a difference ("S" is not often
>>>>> used)?
>>>> 
>>>> I chose "S" as a variance stabilising coding schemes following Bivand, 
>>>> Pebesma et al. (2008), p. 253 but have now switched back to "W". Apart 
>>>> from the warnings cited above I encountered no more issues.
>>>> 
>>>> Thank you for your help and for making this package available.
>>>> 
>>>> Philine
>>>> 
>>>>> 
>>>>> Roger
>>>>> 
>>>>>> Is there anything inherent in my approach that could cause this
>>>>>> problem or would one need to look at the data? I include the session
>>>>>> info below.
>>>>>> 
>>>>>> Thanks for any help,
>>>>>> 
>>>>>> Philine
>>>>>> 
>>>>>>> sessionInfo()
>>>>>> R version 3.0.1 (2013-05-16)
>>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>> 
>>>>>> locale:
>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>>> [5] LC_TIME=English_United States.1252
>>>>>> 
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>> 
>>>>>> other attached packages:
>>>>>> [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13
>>>>>> 
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12
>>>>>> MASS_7.3-26 nlme_3.1-109 splines_3.0.1 tools_3.0.1
>>>>>> 
>>>>>>> library(help=spdep)
>>>>>> Information on package ?spdep?
>>>>>> 
>>>>>> Description:
>>>>>> 
>>>>>> Package: spdep
>>>>>> Version: 0.5-65
>>>>>> Date: 2013-09-04
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list