[R-sig-Geo] correlogram for categorical data

White.Denis at epamail.epa.gov White.Denis at epamail.epa.gov
Sat Oct 21 01:32:13 CEST 2006


Roger Bivand <Roger.Bivand at nhh.no> wrote on 2006-10-19 00:05:31:

> On Wed, 18 Oct 2006 White.Denis at epamail.epa.gov wrote:
>
> > I have a 1000 x 1000 grid of categorical values (nine of these) and
want
> > to compute and plot a correlogram.  Function cell2nb() will take a
while
> > it appears but if that succeeds then can methods of sp.correlogram()
be
> > used on categorical data?  What are "style" options in
sp.correlogram()?
> >
>
> Sorry, could you say how a correlogram might be constructed for
> categorical values (eg. land cover)? Wouldn't join counts be a more
> natural choice? joincoint.multi() in spdep has a Jtot value of total
> different category joins, so using that with different lags of a
cell2nb()
> neighbour list might work. However, the 1M cell grid is pretty large
for
> cell2nb(), using dnearneigh() on cell centres may be faster and scale
> better, (and other possibilities should exist) and nblag() will be
> definitely sub-optimal in this setting. For join counts, the "B"
weights
> style is the obvious one to chose. Note that joincoint.multi() is not
> coded in C.
>
> This would need trying on a small subset and scaling up - I think that

> alternative routes to constructing the lagged neighbour lists would be

> preferable.
>
> Roger
>

I wrote a short function to compute proportion agreement between classes
at different lags, R code below.  Because it's a brute force O(n^2)
algorithm, it takes way too long in R for datasets with more than some
10's of points.  So I wrote a C version that took about 20 hours for 1M
points on a 2GHz linux computer.  Subsampling from the 1000x1000 image
reduced time considerably, naturally.

I haven't looked at Cliff and Ord or other texts on these kinds of
statistics for a long time and I don't know whether this simple
proportion agreement measure is closely related to a multinomial
joincount or not.  And so I don't know what the test for non-randomness
is or what a variance measure is, etc.  It does seem to portray the
spatial pattern in agreement though.

isotaxogram <- function (pts, nlags=10, maxdist)

# Calculate proportion agreement of categorical point data
# at successive lags
#
# pts      matrix or data frame with columns/names
#          "x", "y", "val"
# nlags    number of lags in the output function
# maxdist  maximum distance over which to compute the function;
#          needs to be specified

{
    agree <- num <- lag <- rep (0, nlags)
    for (i in 1:(nrow(pts)-1))
    {
        x1 <- pts[i, "x"]
        y1 <- pts[i, "y"]
        for (j in (i+1):nrow(pts))
        {
            x2 <- pts[j, "x"]
            y2 <- pts[j, "y"]
            d <- sqrt ((x2-x1)^2 + (y2-y1)^2)
            d <- round (d / maxdist * nlags)
            a <- as.integer (pts[i, "val"] == pts[j, "val"])
            if (d %in% 1:nlags)
            {
                agree[d] <- agree[d] + a
                num[d] <- num[d] + 1
            }
        }
    }
    for (k in 1:nlags)
    {
        lag[k] <- k * (maxdist / nlags)
        if (num[k] != 0) agree[k] <- agree[k] / num[k]
    }
    return (list (agree=agree, lag=lag, num=num))
}




More information about the R-sig-Geo mailing list