[R-sig-Geo] Strange R2 values from spatial durbin model

Roger Bivand Roger.Bivand at nhh.no
Wed Apr 2 13:07:57 CEST 2014


On Wed, 2 Apr 2014, Defriez, Emma wrote:

> Hi, I am hoping someone might be able to help me. I have been fitting 
> spatial models with spdep. The mixed model appears to best describe the 
> spatial autocorrelation in the data and I am now trying to find the most 
> parsimonious combination of variables. I have noticed that although the 
> AIC and log likelihood values do change the Nagelkerke R2 remains 
> essentially the same for all models, showing the biggest change when a 
> different weight matrix is used. Below is a small summery table for 3 
> different models and 4 weight matrices. A single variable model with 
> only var1 explains the same amount of variation as the full model which 
> could be reasonable but a completely different single variable model 
> (var6) still explains the same amount of variation which is very 
> strange.
>
> I realise that R2 is not used to determine the most appropriate model 
> with this kind of modelling but it has raised alarm bells ,does this 
> suggests that the model is misspecified? I don't know if I have made a 
> mistake in my code somewhere that I cannot see.

The pseudo-R2 provided is the Nagelkerke, see:

http://en.wikipedia.org/wiki/Coefficient_of_determination#Generalized_R2

comparing the log likelihood values null model (intercept only) with the 
fitted model. If the spatial coefficient "dominates" the outcome in the 
fitted model, what you see is to be expected. Please do not talk about 
this as "explained variance", it is just a number, and (as in OLS) its use 
depends on lots of assumptions that may not be met. Nagelkerke R2, AIC and 
the LL value give some indications, but should never be trusted implicly 
unless you know that the assumptions of all components are met.

Note that you need to be very careful in handling any model including the 
spatially lagged response, as its coefficients cannot be interpreted 
directly, and this may upset you variable selection process. Please see 
?impacts for guidance on this.

Hope this clarifies,

Roger



>
>
> weight         formula       					AIC       	            	 R2         		 LogLik
> W 	chl~ var1+var2+var3+var4+var5+var6 		-34554.06 	0.8910829 	17292.03
> W            			          chl~ var1		 -33882.83 	0.8886748 	16946.41
> W       			                          chl~ var6		 -33581.47 	0.8876084 	16795.74
> S 	chl~ var1+var2+var3+var4+var5+var6 		-28317.57 	0.8673353 	14174.79
> S          			     	        chl~ var1		 -26324.23 	0.8586102 	13168.12
> S           				        chl~ var6		-24746.26 	0.8513727 	12379.13
> B	chl~ var1+var2+var3+var4+var5+var6 		-24368.85 	0.8496826 	12200.43
> B          				        chl~ var1 		-21194.94 	0.8337000 	10603.47
> B           				       chl~ var6 		-19051.44 	0.8220316  	9531.72
> C 	chl~ var1+var2+var3+var4+var5+var6		 -24314.47 	0.8494238 	12173.24
> C              			       chl~ var1 		-20918.91 	0.8322414 	10465.45
> C           				       chl~ var6     		-18839.42 	0.8208338  	9425.71
>
>
> #############################################################
> ### Example of code for creating models, weight style and ols formula varied
> #############################################################
>
> coords<-cbind(longs,lats)
> coords<-as.matrix(coords)
>
> #######################
> #Define neighbourhood
> nn<-dnearneigh(coords,0,150,longlat = TRUE)
>
> #get inverse distances so further points have less influence
> dists <- nbdists(nn, coords)
> idw <- lapply(dists, function(x) 1/(x^2))
>
> #Spatial weights
> nnW<-nb2listw(nn, glist=idw, style=W,zero.policy=TRUE)
>
> #-------------------------------------------  model ----------------------------------------------------------#
> ols<-lm(as.formula(chl~ var1+var2+var3+var4+var5+var6))
>
> hW <- as(as_dgRMatrix_listw(nnW), "CsparseMatrix")
>  set.seed(123456)
>  htr <- trW(hW, m = 24)
>
>  mod<-lagsarlm(formula=formula(ols),listw=nnW, na.action=na.omit, type="mixed", method='LU',
>              tol.solve=1e-16, zero.policy=TRUE, tr=htr)
>
>
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] boot_1.3-9   spdep_0.5-68 Matrix_1.1-0 sp_1.0-14
>
> loaded via a namespace (and not attached):
> [1] coda_0.16-1     deldir_0.1-1    grid_3.0.2      lattice_0.20-24
> [5] LearnBayes_2.12 MASS_7.3-29     nlme_3.1-111    splines_3.0.2
>
> Thanks,
> Emma
>
> PhD student 
> Faculty of Natural Sciences, Department of Life Sciences
> Imperial College London 
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list