[R-sig-Geo] correlogram for categorical data

Roger Bivand Roger.Bivand at nhh.no
Sun Oct 22 16:54:29 CEST 2006


On Fri, 20 Oct 2006 White.Denis at epamail.epa.gov wrote:

> Roger Bivand <Roger.Bivand at nhh.no> wrote on 2006-10-19 00:05:31:
> 
> > On Wed, 18 Oct 2006 White.Denis at epamail.epa.gov wrote:
> >
> > > I have a 1000 x 1000 grid of categorical values (nine of these) and
> want
> > > to compute and plot a correlogram.  Function cell2nb() will take a
> while
> > > it appears but if that succeeds then can methods of sp.correlogram()
> be
> > > used on categorical data?  What are "style" options in
> sp.correlogram()?
> > >
> >
> > Sorry, could you say how a correlogram might be constructed for
> > categorical values (eg. land cover)? Wouldn't join counts be a more
> > natural choice? joincoint.multi() in spdep has a Jtot value of total
> > different category joins, so using that with different lags of a
> cell2nb()
> > neighbour list might work. However, the 1M cell grid is pretty large
> for
> > cell2nb(), using dnearneigh() on cell centres may be faster and scale
> > better, (and other possibilities should exist) and nblag() will be
> > definitely sub-optimal in this setting. For join counts, the "B"
> weights
> > style is the obvious one to chose. Note that joincoint.multi() is not
> > coded in C.
> >
> > This would need trying on a small subset and scaling up - I think that
> 
> > alternative routes to constructing the lagged neighbour lists would be
> 
> > preferable.
> >
> > Roger
> >
> 
> I wrote a short function to compute proportion agreement between classes
> at different lags, R code below.  Because it's a brute force O(n^2)
> algorithm, it takes way too long in R for datasets with more than some
> 10's of points.  So I wrote a C version that took about 20 hours for 1M
> points on a 2GHz linux computer.  Subsampling from the 1000x1000 image
> reduced time considerably, naturally.
> 
> I haven't looked at Cliff and Ord or other texts on these kinds of
> statistics for a long time and I don't know whether this simple
> proportion agreement measure is closely related to a multinomial
> joincount or not.  And so I don't know what the test for non-randomness
> is or what a variance measure is, etc.  It does seem to portray the
> spatial pattern in agreement though.

It would be the non-free sampling as in joincount.test() in spdep, or 
possibly free sampling, which would be simpler. These are cumulated 
same-color tests, so som adaptation would be needed. This isn't the 
biggest problem, though, since the resolution of the grid and the steps in 
the "joincountogram" will get mixed up - very high join counts at small 
distances when the grid resolution is small compared the the effective 
scale of variation of the phenomena. 

Roger

> 
> isotaxogram <- function (pts, nlags=10, maxdist)
> 
> # Calculate proportion agreement of categorical point data
> # at successive lags
> #
> # pts      matrix or data frame with columns/names
> #          "x", "y", "val"
> # nlags    number of lags in the output function
> # maxdist  maximum distance over which to compute the function;
> #          needs to be specified
> 
> {
>     agree <- num <- lag <- rep (0, nlags)
>     for (i in 1:(nrow(pts)-1))
>     {
>         x1 <- pts[i, "x"]
>         y1 <- pts[i, "y"]
>         for (j in (i+1):nrow(pts))
>         {
>             x2 <- pts[j, "x"]
>             y2 <- pts[j, "y"]
>             d <- sqrt ((x2-x1)^2 + (y2-y1)^2)
>             d <- round (d / maxdist * nlags)
>             a <- as.integer (pts[i, "val"] == pts[j, "val"])
>             if (d %in% 1:nlags)
>             {
>                 agree[d] <- agree[d] + a
>                 num[d] <- num[d] + 1
>             }
>         }
>     }
>     for (k in 1:nlags)
>     {
>         lag[k] <- k * (maxdist / nlags)
>         if (num[k] != 0) agree[k] <- agree[k] / num[k]
>     }
>     return (list (agree=agree, lag=lag, num=num))
> }
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list