[R-sig-Geo] R encounters fatal error in errorsarlm - sdpep package
Roger Bivand
Roger.Bivand at nhh.no
Tue Oct 1 13:12:07 CEST 2013
On Tue, 1 Oct 2013, Philine Gaffron wrote:
> Thank you for the Geographical Analysis reference, very helpful. I have
> deduced that the method argument should be "Matrix" for my data rather than
> "LU", since the "W" weights are symmetric. However, I am still getting the
> warnings on non convergence and since you state that this maybe a
> specification problem, I would like to find the error.
I think that, unless you need the Hausman test from errorsarlm, just turn
it off with errorsarlm(..., control=list(returnHcov=FALSE)); the ML fit is
quite OK, it is only the Hausman test component that is giving the
warnings for the spatial coefficient close to its limits, and when using
other methods than "eigen". If you do need it and cannot use
method="eigen", increase the order until the warning (it is a warning, not
an error) goes away with errorsarlm(..., control=list(pWOrder=<big
integer>)). I have had win-builder roll up a Windows binary of the
development version with more details on the problem, on:
http://win-builder.r-project.org/5IVvaH0M3Qsj
See the examples for powerWeight, and look at the returned component for
the warnining-displaying errorsarlm fit called $pWinternal showing how
convergence failed; series needs to fall below tol for convergence.
As your n is not large, you can compare the "eigen" and "Matrix" methods
to see how the Hausman test output diverges as the approximation to the
term needed deteriorates.
The underlying problem that you have touched is (as in the sections in
Bivand et al. 2013 on approximations) that using power series to generate
the product of the inverse of the nxn (I - \rho W) matrix and an arbitrary
vector also deteriotates when \rho is near its limits.
Thanks for the detailed results; I've deleted them in this reply to save
space.
Roger
> I did eliminate all
> polygons without neighbours for all my datasets before calling errorsarlm, so
> this should not be the problem. I include a full code example inline below
> (including documentation of the intermediate objects and the R outputs) for
> one dataset with warning and one without below in case you have time to look
> at it and find a possible error source that I have not identified.
>
> comments on crash also inline
>
> Am 30.09.2013 12:31, schrieb Roger Bivand:
>> On Mon, 30 Sep 2013, Philine Gaffron wrote:
>>
>>> Thank you for the response, Roger. My answers below. I have also added a
>>> question on a warning message I have received and would be glad to find
>>> out whether this means I need to change the analysis approach or just have
>>> to take it as additional information.
>>
>> Thanks for reporting back: some points for clarification inline.
>>
>>>
>>> Am 25.09.2013 00:53, schrieb Roger Bivand:
>>>> On Wed, 25 Sep 2013, Philine Gaffron wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to run a spatial regression analysis using the errorsarlm
>>>>> function in the sdpep package. I first ran into memory problems (on a
>>>>> larger dataset than the one described below) but have adjusted the
>>>>> memory size limit:
>>>>>
>>>>>> memory.limit(size = 800000)
>>>>> [1] 8e+05
>>>>
>>>> Which sets your memory limit under Windows to 781 Gb (size is in Mb) -
>>>> that probably explains the error exit unless you have a lot of memory.
>>>> Best avoid making any changes from the default in a fresh session
>>>> unless you know exactly what you are doing.
>>>
>>> I did up the virtual memory of the machine to 800Gb first (it has 1Tb in
>>> total and there is currently nothing else running on it) so I hope that
>>> that was not the problem. Thanks for the hint, though.
>>>
>>>>
>>>>>
>>>>> However, when I try to execute the following code in R Studio, I
>>>>> receive the message "R Session aborted - R encountered a fatal error
>>>>> - The session was terminated"
>>>>
>>>> Always rerun any code with problems outside R Studio, because it can
>>>> interfere with the reporting of warnings and errors. Certainly
>>>> errorsarlm() has no reason to cause R to crash - and this is what you
>>>> are suggesting. If you can reproduce this outside R Studio, and
>>>> without setting memory limits, I may need to see your data to try to
>>>> reproduce any error exit from the R session.
>>>
>>> I tried this and had no more crashes - especially after I made the changes
>>> suggested/outlined below
>>
>> Does this mean that the R error exit occurred when you used R Studio, and
>> did not occur when R was run directly, using exactly the same data and
>> code? If so, please also let the R Studio people know, so that they can
>> continue debugging.
>
> I will try to reproduce this once I have understood / solved my
> non-convergence problem. If and when I do, I will send the code and data to
> the RStudio people and cc you (?).
>>
>>>
>>>>
>>>>>
>>>>>> ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data =
>>>>>> ElDorado_urban, listw=ElDoU_listw)
>>>>
>>>> Did you read the advice on ?errorsarlm about the use of alternative
>>>> methods, such as "LU" or "Matrix" for larger data sets? You are using
>>>> the default "eigen" method, but this should certainly have worked for
>>>> the modest n=2081. Can you reproduce the problem with a built-in data
>>>> set, such as elect80?
>>>
>>> I opted for the "LU" method and changed the weights style (see below). Is
>>> there any documentation that tells me the implications of using method
>>> ="LU"? I only found the note in the spdep documentation "LU_order default
>>> FALSE; used in “LU_prepermutate”, note warnings given for lu method" but
>>> nothing further. I did receive warnings in the output, though (for several
>>> but not all of my datasets) and I am not sure what they mean / what their
>>> implications are for the results. Do I need to choose another method? The
>>> warnings in every case were:
>>>
>>
>> See Bivand, Hauke & Kossowski (2013) Geographical Analysis for some
>> details; LU_prepermutate emulates the implementation by finding an initial
>> fill-in re-ordering, but in R has little speed-up effect, hence it is
>> turned off by default.
>>
>>> 1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B,:
>>> not converged within order iterations
>>>
>>> 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C,:
>>> not converged within order iterations
>>
>> These are unusual, and may signal a mis-specified setting. The default
>> number of powers is 250, and lambda^250 is a very small number unless
>> lambda is close to its (probably upper) limit. The computation in fact only
>> affects the accuracy of the Hausman test on the difference between OLS and
>> spatial error coefficients, so may be adjusted if lambda is close to limits
>> by setting control argument pWOrder to a larger value than 250, see
>> ?powerWeights. As a symptom of lambda close to its limits, it suggests
>> mis-specification, possibly in terms of observation support, with
>> neighbouring properties sharing important drivers. This may be suggested by
>> your data scenario described below; scrutinising the intermediate objects
>> in your script may also put you on the track of things behaving in
>> unexpected ways. I would expect many observations to have no neighbours
>> with a 150m threshold, but you don't show no-neighbour control being turned
>> off.
> I removed all polygons without neighbours prior to calling errorsarlm - see
> example below.
>>
>> Hope the helps,
>>
>> Roger
>>
>>>
>>> I have not tried reproducing the error with the built-in dataset but if
>>> that is of interest to you from the point of view of spdep development, I
>>> could give it a try.
>>>
>>>>
>>>>>
>>>>> The data was imported from an ArcGIS shape file:
>>>>>> Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
>>>>>> ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")
>>>>>
>>>>> The subset contains data from 2081 polygons
>>>>>
>>>>> The listw object was created as follows:
>>>>>
>>>>>> Eld_U_coords <- coordinates(ElDorado_urban)
>>>>>> Eld_U_IDs <- row.names(ElDorado_urban)
>>>>>> ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names =
>>>>>> Eld_U_IDs)
>>>>
>>>> Is 492 a well-chosen upper distance threshold? I assume that the input
>>>> data are projected, so that these are measures in the units of the
>>>> projection - metres? If Eld_U_coords are actually geographical
>>>> coordinates, but longlat=NULL, d2 is taken as a planar measure, if you
>>>> declare longlat=TRUE, it is in km. There is no easy way to trap
>>>> undeclared geographical coordinates.
>>>
>>> The distance thresholds are in feet
>>> (NAD_1983_StatePlane_California_II_FIPS_0402_Feet) and are related to
>>> dispersion distances for road traffic emissions. For explanation: my
>>> polygons are land use parcels for which I have calculated emission loads
>>> in grams/area from traffic loads on a road network. Since I am assuming a
>>> dispersion distance of 492 feet / 150 m of emissions away from the network
>>> in the example above, I am further assuming, that parcels which are within
>>> that distance of one another are affected by the same emission sources
>>> (i.e. traffic loads on the network).
>>>
>>>>
>>>>>> ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="S",
>>>>>> zero.policy=NULL)
>>>>>
>>>>
>>>> Does choosing a different style make a difference ("S" is not often
>>>> used)?
>>>
>>> I chose "S" as a variance stabilising coding schemes following Bivand,
>>> Pebesma et al. (2008), p. 253 but have now switched back to "W". Apart
>>> from the warnings cited above I encountered no more issues.
>>>
>>> Thank you for your help and for making this package available.
>>>
>>> Philine
>>>
>>>>
>>>> Roger
>>>>
>>>>> Is there anything inherent in my approach that could cause this
>>>>> problem or would one need to look at the data? I include the session
>>>>> info below.
>>>>>
>>>>> Thanks for any help,
>>>>>
>>>>> Philine
>>>>>
>>>>>> sessionInfo()
>>>>> R version 3.0.1 (2013-05-16)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12
>>>>> MASS_7.3-26 nlme_3.1-109 splines_3.0.1 tools_3.0.1
>>>>>
>>>>>> library(help=spdep)
>>>>> Information on package ?spdep?
>>>>>
>>>>> Description:
>>>>>
>>>>> Package: spdep
>>>>> Version: 0.5-65
>>>>> Date: 2013-09-04
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list