[R-sig-Geo] Moran's I with Objects of Different Length

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Wed Aug 12 18:34:32 CEST 2020


On Wed, 12 Aug 2020, Chanda Chiseni wrote:

> Hi Roger
>
> Thank you for your response.
>
> I am using DHS individual level data , they cluster individuals into about
> 25-30 people and assign them coordinates depending on where they were
> surveyed. So about 30 individuals have similar coordinates. Yes I do have
> more than one observation at one point as you have stated, this could be
> the source of my problem indeed, My question is how could i go about
> creating nearest neighbors and weights by clustering individuals that share
> the same point? I sought for a solution online but could not find any.

Searching online is seldom informative, there is a poor signal-to-noise 
ratio.

See for example the HSAR package or other multi-level approaches, review 
in https://doi.org/10.1016/j.spasta.2017.01.002 and in references in HSAR. 
You have a lower layer of observations in an upper layer of known survey 
locations, so a multi-level approach respects that structure, I think. 
Probably an upper level MRF in mgcv::gam() or try the hgml package, or 
INLA/BayesX for Bayesian approaches.

In these you fit a model with spatially structured random effects at the 
upper level (the weights probably need to be symmetric), which then show 
whether including this term improves the fit.

Hope this helps,

Roger

> Sorry I am a novice at spatial econometrics, this may be obvious to some ,
> so instead of testing for spatial correlation in my residuals, I should in
> essence test for spatial autocorrelation in my outcome variable, for
> instance in my analysis I am using a dummy variable of whether an
> individual is hiv positive or negative. And is there a specific way of
> using a moran.test() on a dummy variable?
>
>
> Kind Regards,
>
> Michael Chanda Chiseni
>
> Phd Candidate
>
> Department of Economic History
>
> Lund University
>
> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>
>
>
> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>
>
>
>
>
>
> On Wed, Aug 12, 2020 at 12:55 PM Roger Bivand <Roger.Bivand using nhh.no> wrote:
>
>> On Tue, 11 Aug 2020, Chanda Chiseni wrote:
>>
>>> I am trying to perform moran's I test on residuals from by probit model
>>> using k-nearest neighbour weights, however i run into an error which i
>>> cant seem to find the solution anywhere online the error is
>>>
>>>> moran.test(residuals.glm(svyprobitest),knear2weight)
>>> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>>>  objects of different length
>>>
>>>
>>> The codes I use to come up to this are below
>>>
>>>> library(foreign)
>>>> dhsanalysis2= read.dta("DHSSpatialreg11.dta")
>>> ##removing NAs in PSU
>>>> dhsanalysis3=subset(dhsanalysis2, !is.na(psu))
>>> ##generating a survey design
>>>> mydesighdhs= svydesign(ids =~psu, data =dhsanalysis3, weight = ~wgt,
>>> strata = ~v023,nest=TRUE)
>>> ##Defining coordinate colums
>>>> coordinates(dhsanalysis3)= c("longnum","latnum")
>>> #Defining Projection
>>>> proj4string(dhsanalysis3) <- CRS("+init=epsg:4326")
>>> ##binding longiude and latitude
>>>> lon2<- dhsanalysis3$longnum
>>>> lat2<- dhsanalysis3$latnum
>>>> coords<- cbind(lon2,lat2)
>>> ##Creating spatial weights based on the nearest neighbour
>>>> knear2= knearneigh(coords,k=2,longlat=T)
>>> Warning message:
>>> In knearneigh(coords, k = 2, longlat = T) :
>>>  knearneigh: identical points found ### I get a warning message on
>>> identical points found, is this a problem and how would i deal with this
>>
>> It usually indicates muddled thinking. You have more than one observation
>> at the same point, which could be an unintended duplicate (copy of the
>> same observation), or it could indicate that any spatial process has
>> multiple values at that point. In geostatistics, one would jitter the
>> points to introduce distance. In your case, this probably isn't households
>> living at one address point. All the observations at the same point will
>> get the same sets of neighbours. If there are many co-located
>> observations, k in k-nearest may be too small to include them all.
>>
>>>> knear2.nb= knn2nb(knear2)
>>>> knear2weight= nb2listw(knear2.nb, style="W",zero.policy=T)
>>>
>>> ##declaring categorical variables as factor variables
>>>> dhsanalysis3$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>>>> dhsanalysis3$protestant.f <- factor(dhsanalysis3$protestant2)
>>>> dhsanalysis3$Married.f <- factor(dhsanalysis3$Married)
>>>> dhsanalysis3$female.f <- factor(dhsanalysis3$female)
>>>> dhsanalysis3$urban1.f <- factor(dhsanalysis3$urban1)
>>>> dhsanalysis3$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>>>> dhsanalysis3$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>>>> dhsanalysis3$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>>>> dhsanalysis3$Province1.f <- factor(dhsanalysis3$Province1)
>>>> dhsanalysis3$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>>>> dhsanalysis3$occupation2.f <- factor(dhsanalysis3$occupation2)
>>>> dhsanalysis3$highested.f <- factor(dhsanalysis3$highested)
>>>> protestant.f= dhsanalysis3$protestant.f
>>>> hivpositive.f=dhsanalysi3$hivpositive.f
>>>> Married.f=dhsanalysis3$Married.f
>>>> female.f=dhsanalysis3$female.f
>>>> urban1.f=dhsanalysis3$urban1.f
>>>> river10kmdum.f=dhsanalysis3$river10kmdum.f
>>>> Explorer50kmdum.f=dhsanalysis2$Explorer50kmdum.f
>>>> Province1.f=dhsanalysis3$Province1.f
>>>> WealthIndex.f=dhsanalysis3$WealthIndex.f
>>>> occupation2.f=dhsanalysis3$occupation2.f
>>>> highested.f=dhsanalysis3$highested.f
>>>> Age=dhsanalysis3$Age
>>>> Age2=dhsanalysis3$Age2
>>>> HIVKnowledge=dhsanalysis3$HIVKnowledge
>>>> churchkm=dhsanalysis3$churchkm
>>>> lnhospitalkm=dhsanalysis3$lnhospitalkm
>>>> lnElevationMean=dhsanalysis3$lnElevationMean
>>>
>>>> ###VARIABLES IN SURVEY DESIGN
>>>> mydesighdhs$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>>>> mydesighdhs$protestant.f <- factor(dhsanalysis3$protestant2)
>>>> mydesighdhs$Married.f <- factor(dhsanalysis3$Married)
>>>> mydesighdhs$female.f <- factor(dhsanalysis3$female)
>>>> mydesighdhs$urban1.f <- factor(dhsanalysis3$urban1)
>>>> mydesighdhs$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>>>> mydesighdhs$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>>>> mydesighdhs$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>>>> mydesighdhs$Province1.f <- factor(dhsanalysis3$Province1)
>>>> mydesighdhs$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>>>> mydesighdhs$occupation2.f <- factor(dhsanalysis3$occupation2)
>>>> mydesighdhs$highested.f <- factor(dhsanalysis3$highested)
>>>> mydesighdhs$Age=Age
>>>> mydesighdhs$Age2=Age2
>>>> mydesighdhs$HIVKnowledge=HIVKnowledge
>>>> mydesighdhs$churchkm=churchkm
>>>> mydesighdhs$lnhospitalkm=lnhospitalkm
>>>> mydesighdhs$lnElevationMean=lnElevationMean
>>>
>>>> svyprobitest= svyglm(hivpositive.f~ churchkm +lnhospitalkm
>> +protestant.f+
>>> Age+
>>>
>> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
>>> + WealthIndex.f + HIVKnowledge+ occupation2.f+
>>> highested.f,design=mydesighdhs,family = quasibinomial(link =
>>> "probit"),data=dhsanalysis3)
>>> Error in .subset2(x, i, exact = exact) : subscript out of bounds
>>
>> So you've an error anyway.
>>
>>>
>>> ## Iran my probit model without implementing the survey design in the
>> model
>>> just to see whether Moran test is working
>>>> svyprobitest2= glm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
>>> Age+
>>>
>> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
>>> + WealthIndex.f + HIVKnowledge+ occupation2.f+ highested.f,family =
>>> quasibinomial(link = "probit"),data=dhsanalysis3)
>>> #Moran test
>>>> moran.test(residuals.glm(svyprobitest),knear2weight)
>>> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>>>  objects of different length
>>>
>>
>> Testing residuals using moran.test() is usually wrong, as the expectation
>> and variance of the statistic are based on a null model (intercept only),
>> not a model with covariates.
>>
>> What are length(residuals.glm(svyprobitest)) and length(knear2.nb)? Did
>> glm drop observations with missing values? If so, you should subset both
>> the data submitted to glm() and the neighbour object so that they match.
>>
>> Hope this helps,
>>
>> Roger
>>
>>>
>>> Kind Regards,
>>>
>>> Michael Chanda Chiseni
>>>
>>> Phd Candidate
>>>
>>> Department of Economic History
>>>
>>> Lund University
>>>
>>> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>>>
>>>
>>>
>>> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en


More information about the R-sig-Geo mailing list