[R-sig-Geo] Strange R2 values from spatial durbin model
Defriez, Emma
e.defriez11 at imperial.ac.uk
Wed Apr 2 13:37:41 CEST 2014
Ah I see, the spatial coefficient does seem to dominate my output so that would explain it. Thank you for your help.
Emma
-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Sent: 02 April 2014 12:08
To: Defriez, Emma
Cc: 'r-sig-geo at r-project.org'
Subject: Re: [R-sig-Geo] Strange R2 values from spatial durbin model
On Wed, 2 Apr 2014, Defriez, Emma wrote:
> Hi, I am hoping someone might be able to help me. I have been fitting
> spatial models with spdep. The mixed model appears to best describe
> the spatial autocorrelation in the data and I am now trying to find
> the most parsimonious combination of variables. I have noticed that
> although the AIC and log likelihood values do change the Nagelkerke R2
> remains essentially the same for all models, showing the biggest
> change when a different weight matrix is used. Below is a small
> summery table for 3 different models and 4 weight matrices. A single
> variable model with only var1 explains the same amount of variation as
> the full model which could be reasonable but a completely different
> single variable model
> (var6) still explains the same amount of variation which is very
> strange.
>
> I realise that R2 is not used to determine the most appropriate model
> with this kind of modelling but it has raised alarm bells ,does this
> suggests that the model is misspecified? I don't know if I have made a
> mistake in my code somewhere that I cannot see.
The pseudo-R2 provided is the Nagelkerke, see:
http://en.wikipedia.org/wiki/Coefficient_of_determination#Generalized_R2
comparing the log likelihood values null model (intercept only) with the fitted model. If the spatial coefficient "dominates" the outcome in the fitted model, what you see is to be expected. Please do not talk about this as "explained variance", it is just a number, and (as in OLS) its use depends on lots of assumptions that may not be met. Nagelkerke R2, AIC and the LL value give some indications, but should never be trusted implicly unless you know that the assumptions of all components are met.
Note that you need to be very careful in handling any model including the spatially lagged response, as its coefficients cannot be interpreted directly, and this may upset you variable selection process. Please see ?impacts for guidance on this.
Hope this clarifies,
Roger
>
>
> weight formula AIC R2 LogLik
> W chl~ var1+var2+var3+var4+var5+var6 -34554.06 0.8910829 17292.03
> W chl~ var1 -33882.83 0.8886748 16946.41
> W chl~ var6 -33581.47 0.8876084 16795.74
> S chl~ var1+var2+var3+var4+var5+var6 -28317.57 0.8673353 14174.79
> S chl~ var1 -26324.23 0.8586102 13168.12
> S chl~ var6 -24746.26 0.8513727 12379.13
> B chl~ var1+var2+var3+var4+var5+var6 -24368.85 0.8496826 12200.43
> B chl~ var1 -21194.94 0.8337000 10603.47
> B chl~ var6 -19051.44 0.8220316 9531.72
> C chl~ var1+var2+var3+var4+var5+var6 -24314.47 0.8494238 12173.24
> C chl~ var1 -20918.91 0.8322414 10465.45
> C chl~ var6 -18839.42 0.8208338 9425.71
>
>
> #############################################################
> ### Example of code for creating models, weight style and ols formula
> varied #############################################################
>
> coords<-cbind(longs,lats)
> coords<-as.matrix(coords)
>
> #######################
> #Define neighbourhood
> nn<-dnearneigh(coords,0,150,longlat = TRUE)
>
> #get inverse distances so further points have less influence dists <-
> nbdists(nn, coords) idw <- lapply(dists, function(x) 1/(x^2))
>
> #Spatial weights
> nnW<-nb2listw(nn, glist=idw, style=W,zero.policy=TRUE)
>
> #------------------------------------------- model
> ----------------------------------------------------------#
> ols<-lm(as.formula(chl~ var1+var2+var3+var4+var5+var6))
>
> hW <- as(as_dgRMatrix_listw(nnW), "CsparseMatrix")
> set.seed(123456)
> htr <- trW(hW, m = 24)
>
> mod<-lagsarlm(formula=formula(ols),listw=nnW, na.action=na.omit, type="mixed", method='LU',
> tol.solve=1e-16, zero.policy=TRUE, tr=htr)
>
>
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] boot_1.3-9 spdep_0.5-68 Matrix_1.1-0 sp_1.0-14
>
> loaded via a namespace (and not attached):
> [1] coda_0.16-1 deldir_0.1-1 grid_3.0.2 lattice_0.20-24
> [5] LearnBayes_2.12 MASS_7.3-29 nlme_3.1-111 splines_3.0.2
>
> Thanks,
> Emma
>
> PhD student
> Faculty of Natural Sciences, Department of Life Sciences Imperial
> College London
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list