[R-sig-Geo] Removing spatial autocorrelation - Memory limits

Roger Bivand Roger.Bivand at nhh.no
Mon Dec 12 14:37:15 CET 2011


On Mon, 12 Dec 2011, Chris Mcowen wrote:

> Thanks Roger,
>
> I am aware it may look like I did not read the help page but I did.
>
> I tried to include the longlat argument and got this:
>
> coords<-cbind(Cells$Lat,Cells$Lon)
> nb1.5<-dnearneigh(coords,0,1.5, longlat = TRUE)
> Warning message:
> In dnearneigh(coords, 0, 1.5, longlat = TRUE) :
>  Coordinates are not geographical: longlat argument wrong
>
>
>> head(Cells$Lon)
> [1] 121.75 120.75 121.25 121.75 122.25 119.75
>> head(Cells$Lat)
> [1] 41.25 40.75 40.75 40.75 40.75 40.25
>

The test is for values exceeding the -90 to 90 and -180 to 180 or 0 to 360 
ranges, specifically sp:::.ll_sanity(bbox(coords)), but is issued as 
warning to accommodate edge issues. The 1.5km distance here would be very 
narrow, I guess that 150km would be closer to your needs. In fact here, a 
Queen contiguity would look like a planar distance, so you can choose 
between the kinds of spatial processes you feel might be relevant.

>
>
> In regards to the formula in the errorsarlm, I was following the method of
> Kissling and Carl 2008:

OK, so if you are following their lead, this is where the issues are 
coming from. They do write in the Word file of the supplementary material:

ols<-lm(data$organism1~data$rain+data$jungle)
sem.nb1.5.w<-errorsarlm(ols, listw=nb1.5.w)

but as far as I am aware, this never worked, because ols is an lm object, 
not a formula object. They should have written:

ols <- lm(organism ~ rain + jungle, data=data)
sem.nb1.5.w <- errorsarlm(organism ~ rain + jungle, data=data,
   listw=nb1.5.w)

You can get there by coercing ols to a formula for the first version:

ols <- lm(data$organism1~data$rain+data$jungle)
sem.nb1.5.w <- errorsarlm(formula(ols), listw=nb1.5.w)

and for the second version by providing the data= argument:

ols <- lm(organism ~ rain + jungle, data=data)
sem.nb1.5.w <- errorsarlm(formula(ols), data=data, listw=nb1.5.w)

but that isn't what they say. It is cleaner to say:

fm0 <- formula(organism ~ rain + jungle)
ols <- lm(fm0, data=data)
sem.nb1.5.w <- errorsarlm(fm0, data=data, listw=nb1.5.w)

and perhaps use update() methods on the formula object if needed.

>
> #######################
> #OLS model for organism 1
> ols<-lm(data$organism1~data$rain+data$jungle)
> summary(ols)
> res.ols <- residuals(ols)
>
> #######################
>
> #######################
> #SARerr model with neighbourhood distance 1.5 and coding style "W"
>
> #Specify SARerr model
> sem.nb1.5.w<-errorsarlm(ols, listw=nb1.5.w)
> summary(sem.nb1.5.w)
> res.sem.nb1.5.w <- residuals(sem.nb1.5.w)
>
> #######################
>
>
>
> In regards to the sparse matrix:
>
> If I am honest, I was unaware of what this does.. I had seen it used in
> situations with large datasets, but methodologically I am unsure what it is
> doing - I ran it and it completed very quickly, I was unsure why this was
> the case. I appreciate I should read up on the methodology behind this
> before reporting my results.

There are references on the help page - if you used the prefered Matrix 
method, which gives an exact result, Barry & Pace and LeSage & Pace are 
relevant.

Hope this clarifies,

Roger

>
> Thanks
>
> Chris
>
>
>
> -----Original Message-----
> From: Roger Bivand [mailto:Roger.Bivand at nhh.no]
> Sent: 12 December 2011 10:51
> To: Chris Mcowen
> Cc: r-sig-geo at r-project.org
> Subject: Re: [R-sig-Geo] Removing spatial autocorrelation - Memory limits
>
> On Mon, 12 Dec 2011, Chris Mcowen wrote:
>
>> Dear List,
>>
>> I am trying to model variation is fisheries catch, I have a large data
>> set (36574) cells , with knowledge of the tonnes of fish caught in each
>> "Cell" ( .5 degree) global cells.
>>
>> At present I am simply wanting to investigate if certain "area" (clusters
> of
>> cells) have high numbers of fish than others
>>
>> Due to the grid nature of the data set I have significant spatial
>> autocorrelation in the data set.
>>
>> I have tried:
>>
>> REALM_gls <- gls(tonnesperkm_log~REALM, correlation = corGaus(form =~Lat +
>> Lon), data = Cells)
>>
>> coords<-cbind(Cells$Lat,Cells$Lon)
>> coords<-as.matrix(coords)
>> nb1.5<-dnearneigh(coords,0,1.5)
>> nb1.5.w<-nb2listw(nb1.5, glist=NULL, style="W", zero.policy=TRUE)
>> ols_lm<-lm(Cells$tonnesperkm_log~Cells$REALM)
>> ols_lm_error_REALM<-errorsarlm(ols_lm, listw=nb1.5.w, na.action = na.omit,
>> zero.policy = T, data = Cells)
>
> This is not right, the first argument to errorsarlm() is a formula object.
> Did you look at the help page? If you did, did you look at the method=
> argument, which offers among others sparse matrix methods for larger N?
> You may have no-neighbour observations, just see the help page. In
> addition, you have geographical coordinates, so should set the arguments
> to dnearneigh appropriately - here they are assuming a planar surface, but
> may be OK to recover neighbours.
>
>> llk1 <- knn2nb(knearneigh(coords, k=1, longlat=FALSE))
>> col.nb.0.all <- dnearneigh(coords, 0, llk1)
>
> Wrong function argument again, llk1 is an nb object, not a scalar
> distance. Again, please do read the help pages carefully. You need to find
> the maximum of the first nearest neighbour distances, but I advise against
> this here.
>
> Advice, please do read the help pages carefully, and relevant literature,
> for example ASDAR as as www.asdar-book.org.
>
> Roger
>
>
>> col.nb.0.all
>> summary(col.nb.0.all)
>> ols_error_REALM2<-errorsarlm(ols_lm, listw=col.nb.0.all, na.action =
>> na.omit, zero.policy = T, data = Cells)
>>
>>
>>
>>
>>
>> However the memory required is large - 16GB.
>>
>>
>>
>> This won't run on my computer,  I therefore had two questions:
>>
>>
>>
>> First, is the process I am doing correct - or can it be done in a more
>> efficient way?
>>
>>
>>
>> Second, can I take a subsample of the data i.e every other cell and run
> the
>> analysis? OR find a way of subsetting and "re-joining"?
>>
>>
>>
>> Thanks in advance,
>>
>>
>>
>> Chris
>>
>>
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list