[R-sig-Geo] Question spdep package - lagsarlm not terminating

Tue May 7 07:35:00 CEST 2019

Dear Roger,

Thank you again for your fast response and the helpful answers and clarifications. I will try to resolve all the issues with your recommendations in the coming days.

Just a few comments here:

I created the listw object by nb2listw() (without any error messages) and set „zero.policy=TRUE“ while calling the function. However, I used the function subset() instead of subset.nb(). I will play around with it a bit and see which works best for me. But it is good to know that observations with missing values are dropped by default and the matrix adjusted.

With regards to your question related to GWR: Yes, the error occurs while printing a fitted object. And I also thought that it might be linked to the collinearity in the covariates, just as it is the case for the error with lagsarlm(). However, if you do not recommend it for any other purposes I will have to discuss its use with my thesis supervisor again.

Have a nice day!

Best regards,

Raphael

> Am 06.05.2019 um 17:20 schrieb Roger Bivand <Roger.Bivand using nhh.no>:
> 
> On Mon, 6 May 2019, Raphael Mesaric via R-sig-Geo wrote:
> 
>> 
>>> Dear Roger,
>>> 
>>> Thank you very much for your fast response.
>>> 
>>> My weight matrix stems from a rectangular (but not square) grid. So, I think I will have to use the „LU“ method.
> 
> No, this is a misunderstanding. All weights matrices used for fitting models are square nxn matrices by definition. If your grid had r rows and c columns, n = r * c, and the weights matrix has n rows and n columns.
> 
>>> 
>>> However, by doing that several other error messages popped up. The first one was the following:
>>> 
>>> Error in spatialreg::errorsarlm(formula = formula, data = data, listw = listw, :
>>>  NAs in lagged dependent variable
>>> In addition: Warning messages:
>>> 1: Function errorsarlm moved to the spatialreg package
>>> 2: In lag.listw(listw, y, zero.policy = zero.policy) :
>>>  NAs in lagged values
>>> 
>>> And this happened even though I do actually not have NaN values in my dataset (I removed them in advance). When I completely reload all variables, it works sometimes for a few tries but after a while the error messages pops up again. Do you know why this might happen? And could I theoretically use a dataset with NaN values and the function omits them?
>>> 
> 
> By default, observations with missing values are dropped and the weights object is subsetted to match. This may produce no-neighbour observations. You can choose to set their spatially lagged values to zero by using zero.policy=TRUE, which is the probable cause of your error. It could very well be that your weights object itself contains no-neighbour observations if not created using spdep::nb2listw() - which will alert you to the problem.
> 
>>> 
>>> The second error message was the following:
>>> 
>>> Error in solve.default(-(mat), tol.solve = tol.solve) :
>>>  system is computationally singular: reciprocal condition number =
>>>  2.71727e-26
>>> 
>>> I found a reference online (related to glm) that this might be linked to explanatory variables which have a high correlation. Would eliminating some of the variables do the job here as well?
>>> 
> 
> Please do not use references online (especially if you do not link to them). The information you need is in the help page, referring to the tol.solve= argument. Indeed, your covariates are scaled such that their are either colinear (unlikely), or that the numerical values of the coefficient variances are very different from the spatial coefficient. You may need to re-scale the response or covariates in order to invert the matrix needed to yield the variances of the coefficients.
> 
>>> 
>>> Then I have two other questions, where I couldn’t find any answers online:
>>> 
>>> Is there an option to eliminate specific rows of a nb object? I tried the following command:
>>> 
>>> rook234x259_2 <- rook234x259[index],
> 
> spdep::subset.nb() only. Your suggestion only creates a big mess, as subsetting reduces the number of rows, and neighbours are indexed between 1 and n. Consequently, at least some of the remaining vectors will point to neighbours with ids > n, and many of the others will point to the wrong neighbours. You can convert to a sparse matrix, subset using "[", and then convert back, but the subset method should work.
> 
>>> 
>>> where index is a vector with the row numbers I would like to keep. But this converts the nb to a list object, and there is no list2nb function (as much as I am aware of). And the option „subset“ does not only remove the rows, but also the corresponding cells which leads to a different neighborhood matrix.
> 
> I do not know what you mean, of course it has to remove links in both directions.
> 
>>> 
>>> 
>>> The last question refers to the package spgwr. I would like to run this model as well. The gwr function runs successfully, but when calling the variable where I stored the result, I get the following error message instead of my coefficients:
>>> 
>>> Error in abs(coef.se <http://coef.se/> <- xm[, cs.ind, drop = FALSE]) :
>>>  non-numeric argument to mathematical function
>>> In addition: Warning messages:
>>> 1: In print.gwr(x) : NAs in coefficients dropped
>>> 2: In cbind(CM, coefficients(x$lm)) :
>>>  number of rows of result is not a multiple of vector length (arg 2)
>>> 
> 
> I always advise against using GWR for anything other than studies exploring its weaknesses - do not use in research or production. Just showing the error message without a small reproducible example using a built-in data set gives nothing to go on. Does the error occur when using print() on a fitted object? Maybe NAs in coefficients suggests colinearity in your covariates, possibly after weighting.
> 
>>> Again, there aren’t any NaN values in my dataset, so I can’t really imagine where the NAs in the coefficients come from.
> 
> But it is terribly easy to introduce them numerically, so they are coming from what you are doing.
> 
> Roger
> 
>>> 
>>> 
>>> Thank you very much for your help!
>>> 
>>> Best regards,
>>> 
>>> Raphael
>>> 
>>> 
>>>> Am 05.05.2019 um 15:21 schrieb Roger Bivand <Roger.Bivand using nhh.no>:
>>>> 
>>>> On Sat, 4 May 2019, Raphael Mesaric via R-sig-Geo wrote:
>>>> 
>>>>> Dear all,
>>>>> 
>>>>> I have a question with regards to the function „lagsarlm" from the package spdep. My problem is that the function is not terminating. Of course, I have quite a big grid (depending on the selection either 34000 or 60600 entries) and I also have a lot of explanatory variables (about 40). But I am still wondering whether there is something wrong.
>>>> 
>>>> If you read the help page (the function is in the spatialreg package and will be dropped from spdep shortly), and look at the references (Bivand et al. 2013), you will see that the default value if the method= argument is "eigen". For small numbers of observations, solving the eigenproblem of a dense weights object is not a problem, but becomes demanding on memory as n increases. You are using virual memory (and may run out of that too) which makes your machine unresponsive. If you choose an alternative method, typically "Matrix" for symmetric or similar to symmetric sparse weights, or "LU" for asymmetric sparse weights.
>>>> 
>>>>> data(house, package="spData")
>>>>> dim(house)
>>>> [1] 25357    24
>>>>> LO_nb
>>>> Neighbour list object:
>>>> Number of regions: 25357
>>>> Number of nonzero links: 74874
>>>> Percentage nonzero weights: 0.01164489
>>>> Average number of links: 2.952794
>>>>> lw <- spdep::nb2listw(LO_nb)
>>>>> system.time(res <- spatialreg::lagsarlm(log(price) ~ TLA + frontage +
>>>> + rooms + yrbuilt, data=house, listw=lw, method="Matrix"))
>>>>  user  system elapsed
>>>> 0.606   0.011   0.631
>>>> 
>>>> so less than 1 second on a standard laptop for similar to symmetric very sparse weights and ~ 25000 observations. Less sparse weights take somewhat longer. The function does not (maybe yet) prevent users trying to do things that are not advisable, because maybe they have 128GB RAM or more, and want to use eigenvalues rather than sparse matrix methods.
>>>> 
>>>> Hope this clarifies,
>>>> 
>>>> Roger
>>>> 
>>>>> 
>>>>> I tried to run a model based on the dataset „columbus“, and there I did not have any problems (but there are way fever entries and variables). I also compared the format of the required inputs, but everything seemed to be equivalent to the inputs used for the „columbus“ model.
>>>>> 
>>>>> Do you have any idea what might be the reason for the extremely long (respectively infinite, it has not terminated yet) computation time? Any suggestions are greatly appreciated.
>>>>> 
>>>>> If you would like to, I can also upload the corresponding code. However, the code includes some MAT-Files as I got the data in MATLAB. I do not yet attach them here because I read that attachments in another format than PDF are not desired as they may contain malicious software.
>>>>> 
>>>>> Thank you for your help in advance!
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Raphael Mesaric
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo using r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>> 
>>>> 
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>>>> https://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org <mailto:R-sig-Geo using r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>> 
> 
> -- 
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <tel:+47%2055%2095%2093%2055>; e-mail: Roger.Bivand using nhh.no <mailto:Roger.Bivand using nhh.no>
> https://orcid.org/0000-0003-2392-6140 <https://orcid.org/0000-0003-2392-6140>
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>

	[[alternative HTML version deleted]]