[R-sig-Geo] contradicting measures based on Log likelihood and AIC in spatial models

Tue Dec 21 08:59:14 CET 2010

On Mon, 20 Dec 2010, elaine kuo wrote:

> Dear Dr. Bivand,
>
> Thank you for the clear explanation.
> Your guess is right, and the missing code is pasted following each question.
> I will check the book you recommended and please advise for improvement of
> the code below.

Elaine,

There are a number of issues with your code; you have been ignoring a 
warning after running the CAR model too - the function should have 
reported:

Non-symmetric spatial weights in CAR model

which are not permitted, but may be valid if case weights are used too. 
You are also using na.omit, which suggests that your data is not complete 
- this may introduce additional problems.

I'm concerned that your distance based weights may also be incorrectly 
specified, as you are using distances on the plane, but I suspect that you 
should be using Great Circle distances.

Since this is a use case in which the spatial coefficients are hitting 
their upper bounds, I would be interested in seeing your data and script, 
if you can make then available, just in case there is something that ought 
to be fixed in the function. Could you please either attach 
Mig_ratio_20101214.csv to an offlist email reply, or if it is large, put 
it on a website and send the URL?

Best wishes,

Roger

>
> Thanks
>
> Elaine
>
>
>
>> (Moran? s I = 0.52)
>>>
>>
>> Was this using lm.morantest() - it should have been?
>
>   =>  Yes
>   # code
>      mig.moran<-lm.morantest(mig.std,nb8.w)
>       print(mig.moran)
>
>
>> (sample size 4873, explanatory variable number: 6)
>>
>>>
>>>
>>>
>>> After trying SAR and CAR in package spdep, the results are as followed.
>>>
>>>
>> Neither of the fits are credible, as the line search has terminated in both
>> cases at its upper limit. These is something seriously wrong with your
>> analysis. Which method= argument were you using, presumably in spautolm? You
>> did not quote the exact way in which you called the functions used - this
>> will be where your problems lie.
>>
>   => Yes, spautolm was used.
>
>    summary(sapply(listw$weights, sum)
>   => It shows
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      1       1       1       1       1       1
>
> # code
>>
>    datam <-read.csv("c:/migration/Mig_ratio_20101214.csv",header=T,
> row.names=1)
>    dim(datam)
>
>     library(ncf)
> library(spdep)
>
> # get the upper bound
>  up <- knearneigh(cbind(datam$lon,datam$lat))
>  upknn <- knn2nb(up)
>  updist1 <- nbdists(upknn,cbind(datam$lon,datam$lat))
>  updist1
>  updistvec <- unlist(updist1)
>  updistvec
>  upmaxd <- max(updistvec)
>  upmaxd  #the upper bound  (8.1124)
>
> #******************************************************************************************
> # spatial listw object
>
>    # Define coordinates, neighbours, and spatial weights
>      coords<-cbind(datam$lon,datam$lat)
>      coords<-as.matrix(coords)
>
>    # Define neighbourhood (here distance 8)
>      nb8<-dnearneigh(coords,0,8.12)
>      summary(nb8)
>
>      #length(nb8)
>      #sum(card(nb8))
>
>    # Spatial weights, illustrated with coding style "W" (row standardized)
>      nb8.w<-nb2listw(nb8, glist=NULL, style="W")
>
>    # regression
>
>    # std model
>       datam.sd<-scale(datam)
>       datam.std<-as.data.frame(datam.sd)
>       summary (datam.std)
>       mean(datam.std)
>
>    # obtain standard deviation
>      sd(datam.std)
>
>    # model
>    mig.std <-lm(SMB~t_r+e_r+p_e+a_r+
>    coast+Iso_1214,data=datam.std)
>
>    summary(mig.std)
>
>    # SAR of spautolm
>     mignb8.sar <- spautolm(SMB~t_r+e_r+p_e+a_r+
>    coast+Iso_1214,data=datam.std,
>     listw=nb8.w, na.action=na.omit, zero.policy=TRUE,
>     family="SAR", method="Matrix", verbose=TRUE)
>     summary(mignb8.sar,Nagelkerke = TRUE)
>
>
>   # CAR of spautolm
>     mignb8.car <- spautolm(SMB~t_r+e_r+p_e+a_r+
>    coast+Iso_1214,data=datam.std,
>     listw=nb8.w, na.action=na.omit, , zero.policy=TRUE,
>     family="CAR", method="Matrix", verbose=TRUE)
>     summary(mignb8.car,Nagelkerke = TRUE)
>
>
>> Are you using a weights matrix with very small values (and very small row
>> sums), and a sparse matrix method? Do you get different outcomes if you set
>> the search interval yourself - or accept the default and row-standardise the
>> weights (style="W" in nb2listw)? Look at summary(sapply(listw$weights, sum))
>> where listw is your listw object. I'll add a warning to spautolm() to test
>> whether the result is equal to a line search bound.
>>
>
>
>
>
>>
>> In reply to your question, had your analysis not been flawed, you could not
>> judge between CAR and SAR in the way you suggest as the models are not
>> strictly nested. Please see Ripley (1981) Spatial Statistics, Wiley, p. 90,
>> for a discussion of the relationship between SAR and CAR representations.
>> Briefly, if we term the CAR weights matrix C and the SAR weights matrix S,
>> then C = S + S' - S'S, and if LL' is the Cholesky decomposition of (I - C),
>> S = I - L'. If you do the full analysis of this relationship, you may be
>> able to proceed, but you also need to consider the interpretation of any
>> results.
>>
>> Hope this helps,
>>
>> Roger
>>
>>
>>  I would like to learn which model was better fit.
>>>
>>> However, the measures based on log likelihood and AIC imply different
>>> contradictions.
>>>
>>> 1. log likehood
>>>
>>>  SAR is better than CAR (4919,629 > 3694.246)
>>>
>>> 2. AIC
>>>
>>>  CAR is better than SAR (-7370.5 > -9821.3)
>>>
>>> Please kindly instruct which criterion I should follow, and advice on any
>>> other measure will be highly appreciated.
>>>
>>> Elaine
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> SAR
>>> Lambda: 0.999 LR test value: 13618 p-value: < 2.22e-16
>>> Log likelihood: 4919.629
>>> ML residual variance (sigma squared): 0.0075669, (sigma: 0.086988)
>>>
>>> Number of observations: 4873
>>>
>>> Number of parameters estimated: 9
>>>
>>> AIC: -9821.3
>>>
>>> Nagelkerke pseudo-R-squared: 1
>>>
>>>
>>>
>>> CAR
>>>
>>> Lambda: 0.999 LR test value: 11167 p-value: < 2.22e-16
>>>
>>>
>>>
>>> Log likelihood: 3694.246
>>>
>>> ML residual variance (sigma squared): 0.012682, (sigma: 0.11261)
>>>
>>> Number of observations: 4873
>>>
>>> Number of parameters estimated: 9
>>>
>>> AIC: -7370.5
>>>
>>> Nagelkerke pseudo-R-squared: 1
>>>
>>>
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no