[R-sig-Geo] R encounters fatal error in errorsarlm - sdpep package
Roger Bivand
Roger.Bivand at nhh.no
Tue Oct 1 20:58:33 CEST 2013
On Tue, 1 Oct 2013, Roger Bivand wrote:
> On Tue, 1 Oct 2013, Philine Gaffron wrote:
>
>> Thank you for the Geographical Analysis reference, very helpful. I have
>> deduced that the method argument should be "Matrix" for my data rather than
>> "LU", since the "W" weights are symmetric. However, I am still getting the
>> warnings on non convergence and since you state that this maybe a
>> specification problem, I would like to find the error.
>
> I think that, unless you need the Hausman test from errorsarlm, just turn it
> off with errorsarlm(..., control=list(returnHcov=FALSE)); the ML fit is quite
> OK, it is only the Hausman test component that is giving the warnings for the
> spatial coefficient close to its limits, and when using other methods than
> "eigen". If you do need it and cannot use method="eigen", increase the order
> until the warning (it is a warning, not an error) goes away with
> errorsarlm(..., control=list(pWOrder=<big integer>)). I have had win-builder
> roll up a Windows binary of the development version with more details on the
> problem, on:
>
> http://win-builder.r-project.org/5IVvaH0M3Qsj
Use rather:
http://win-builder.r-project.org/21E63UI6JzHH
as the first attempt was not completely though through.
Roger
>
> See the examples for powerWeight, and look at the returned component for the
> warnining-displaying errorsarlm fit called $pWinternal showing how
> convergence failed; series needs to fall below tol for convergence.
>
> As your n is not large, you can compare the "eigen" and "Matrix" methods to
> see how the Hausman test output diverges as the approximation to the term
> needed deteriorates.
>
> The underlying problem that you have touched is (as in the sections in Bivand
> et al. 2013 on approximations) that using power series to generate the
> product of the inverse of the nxn (I - \rho W) matrix and an arbitrary vector
> also deteriotates when \rho is near its limits.
>
> Thanks for the detailed results; I've deleted them in this reply to save
> space.
>
> Roger
>
>> I did eliminate all polygons without neighbours for all my datasets before
>> calling errorsarlm, so this should not be the problem. I include a full
>> code example inline below (including documentation of the intermediate
>> objects and the R outputs) for one dataset with warning and one without
>> below in case you have time to look at it and find a possible error source
>> that I have not identified.
>>
>> comments on crash also inline
>>
>> Am 30.09.2013 12:31, schrieb Roger Bivand:
>>> On Mon, 30 Sep 2013, Philine Gaffron wrote:
>>>
>>>> Thank you for the response, Roger. My answers below. I have also added a
>>>> question on a warning message I have received and would be glad to find
>>>> out whether this means I need to change the analysis approach or just
>>>> have to take it as additional information.
>>>
>>> Thanks for reporting back: some points for clarification inline.
>>>
>>>>
>>>> Am 25.09.2013 00:53, schrieb Roger Bivand:
>>>>> On Wed, 25 Sep 2013, Philine Gaffron wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to run a spatial regression analysis using the errorsarlm
>>>>>> function in the sdpep package. I first ran into memory problems (on a
>>>>>> larger dataset than the one described below) but have adjusted the
>>>>>> memory size limit:
>>>>>>
>>>>>>> memory.limit(size = 800000)
>>>>>> [1] 8e+05
>>>>>
>>>>> Which sets your memory limit under Windows to 781 Gb (size is in Mb) -
>>>>> that probably explains the error exit unless you have a lot of memory.
>>>>> Best avoid making any changes from the default in a fresh session
>>>>> unless you know exactly what you are doing.
>>>>
>>>> I did up the virtual memory of the machine to 800Gb first (it has 1Tb in
>>>> total and there is currently nothing else running on it) so I hope that
>>>> that was not the problem. Thanks for the hint, though.
>>>>
>>>>>
>>>>>>
>>>>>> However, when I try to execute the following code in R Studio, I
>>>>>> receive the message "R Session aborted - R encountered a fatal error
>>>>>> - The session was terminated"
>>>>>
>>>>> Always rerun any code with problems outside R Studio, because it can
>>>>> interfere with the reporting of warnings and errors. Certainly
>>>>> errorsarlm() has no reason to cause R to crash - and this is what you
>>>>> are suggesting. If you can reproduce this outside R Studio, and
>>>>> without setting memory limits, I may need to see your data to try to
>>>>> reproduce any error exit from the R session.
>>>>
>>>> I tried this and had no more crashes - especially after I made the
>>>> changes suggested/outlined below
>>>
>>> Does this mean that the R error exit occurred when you used R Studio, and
>>> did not occur when R was run directly, using exactly the same data and
>>> code? If so, please also let the R Studio people know, so that they can
>>> continue debugging.
>>
>> I will try to reproduce this once I have understood / solved my
>> non-convergence problem. If and when I do, I will send the code and data to
>> the RStudio people and cc you (?).
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> ElDU.slm.PM25.white <- errorsarlm(PM_PM25 ~ race_white, data =
>>>>>>> ElDorado_urban, listw=ElDoU_listw)
>>>>>
>>>>> Did you read the advice on ?errorsarlm about the use of alternative
>>>>> methods, such as "LU" or "Matrix" for larger data sets? You are using
>>>>> the default "eigen" method, but this should certainly have worked for
>>>>> the modest n=2081. Can you reproduce the problem with a built-in data
>>>>> set, such as elect80?
>>>>
>>>> I opted for the "LU" method and changed the weights style (see below). Is
>>>> there any documentation that tells me the implications of using method
>>>> ="LU"? I only found the note in the spdep documentation "LU_order default
>>>> FALSE; used in “LU_prepermutate”, note warnings given for lu method" but
>>>> nothing further. I did receive warnings in the output, though (for
>>>> several but not all of my datasets) and I am not sure what they mean /
>>>> what their implications are for the results. Do I need to choose another
>>>> method? The warnings in every case were:
>>>>
>>>
>>> See Bivand, Hauke & Kossowski (2013) Geographical Analysis for some
>>> details; LU_prepermutate emulates the implementation by finding an initial
>>> fill-in re-ordering, but in R has little speed-up effect, hence it is
>>> turned off by default.
>>>
>>>> 1: In powerWeights(W = W, rho = lambda, order = con$pWOrder, X = B,:
>>>> not converged within order iterations
>>>>
>>>> 2: In powerWeights(W = t(W), rho = lambda, order = con$pWOrder, X = C,:
>>>> not converged within order iterations
>>>
>>> These are unusual, and may signal a mis-specified setting. The default
>>> number of powers is 250, and lambda^250 is a very small number unless
>>> lambda is close to its (probably upper) limit. The computation in fact
>>> only affects the accuracy of the Hausman test on the difference between
>>> OLS and spatial error coefficients, so may be adjusted if lambda is close
>>> to limits by setting control argument pWOrder to a larger value than 250,
>>> see ?powerWeights. As a symptom of lambda close to its limits, it suggests
>>> mis-specification, possibly in terms of observation support, with
>>> neighbouring properties sharing important drivers. This may be suggested
>>> by your data scenario described below; scrutinising the intermediate
>>> objects in your script may also put you on the track of things behaving in
>>> unexpected ways. I would expect many observations to have no neighbours
>>> with a 150m threshold, but you don't show no-neighbour control being
>>> turned off.
>> I removed all polygons without neighbours prior to calling errorsarlm - see
>> example below.
>>>
>>> Hope the helps,
>>>
>>> Roger
>>>
>>>>
>>>> I have not tried reproducing the error with the built-in dataset but if
>>>> that is of interest to you from the point of view of spdep development, I
>>>> could give it a try.
>>>>
>>>>>
>>>>>>
>>>>>> The data was imported from an ArcGIS shape file:
>>>>>>> Parcels2008_inc_POP <- readOGR(".", "Parcels_2008_inc_pop")
>>>>>>> ElDorado_urban <- subset(Parcels2008_inc_POP, COUNTYFP10 == "017")
>>>>>>
>>>>>> The subset contains data from 2081 polygons
>>>>>>
>>>>>> The listw object was created as follows:
>>>>>>
>>>>>>> Eld_U_coords <- coordinates(ElDorado_urban)
>>>>>>> Eld_U_IDs <- row.names(ElDorado_urban)
>>>>>>> ElDoU_nb <- dnearneigh(Eld_U_coords, d1 = 0, d2 = 492, row.names =
>>>>>>> Eld_U_IDs)
>>>>>
>>>>> Is 492 a well-chosen upper distance threshold? I assume that the input
>>>>> data are projected, so that these are measures in the units of the
>>>>> projection - metres? If Eld_U_coords are actually geographical
>>>>> coordinates, but longlat=NULL, d2 is taken as a planar measure, if you
>>>>> declare longlat=TRUE, it is in km. There is no easy way to trap
>>>>> undeclared geographical coordinates.
>>>>
>>>> The distance thresholds are in feet
>>>> (NAD_1983_StatePlane_California_II_FIPS_0402_Feet) and are related to
>>>> dispersion distances for road traffic emissions. For explanation: my
>>>> polygons are land use parcels for which I have calculated emission loads
>>>> in grams/area from traffic loads on a road network. Since I am assuming a
>>>> dispersion distance of 492 feet / 150 m of emissions away from the
>>>> network in the example above, I am further assuming, that parcels which
>>>> are within that distance of one another are affected by the same emission
>>>> sources (i.e. traffic loads on the network).
>>>>
>>>>>
>>>>>>> ElDoU_listw <- nb2listw(ElDoU_nb, glist=NULL, style="S",
>>>>>>> zero.policy=NULL)
>>>>>>
>>>>>
>>>>> Does choosing a different style make a difference ("S" is not often
>>>>> used)?
>>>>
>>>> I chose "S" as a variance stabilising coding schemes following Bivand,
>>>> Pebesma et al. (2008), p. 253 but have now switched back to "W". Apart
>>>> from the warnings cited above I encountered no more issues.
>>>>
>>>> Thank you for your help and for making this package available.
>>>>
>>>> Philine
>>>>
>>>>>
>>>>> Roger
>>>>>
>>>>>> Is there anything inherent in my approach that could cause this
>>>>>> problem or would one need to look at the data? I include the session
>>>>>> info below.
>>>>>>
>>>>>> Thanks for any help,
>>>>>>
>>>>>> Philine
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 3.0.1 (2013-05-16)
>>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>>>> [5] LC_TIME=English_United States.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] spdep_0.5-65 Matrix_1.0-12 lattice_0.20-15 rgdal_0.8-11 sp_1.0-13
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] boot_1.3-9 coda_0.16-1 deldir_0.0-22 grid_3.0.1 LearnBayes_2.12
>>>>>> MASS_7.3-26 nlme_3.1-109 splines_3.0.1 tools_3.0.1
>>>>>>
>>>>>>> library(help=spdep)
>>>>>> Information on package ?spdep?
>>>>>>
>>>>>> Description:
>>>>>>
>>>>>> Package: spdep
>>>>>> Version: 0.5-65
>>>>>> Date: 2013-09-04
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list