[R-sig-Geo] correlogram for categorical data
White.Denis at epamail.epa.gov
White.Denis at epamail.epa.gov
Sat Oct 21 01:32:13 CEST 2006
Roger Bivand <Roger.Bivand at nhh.no> wrote on 2006-10-19 00:05:31:
> On Wed, 18 Oct 2006 White.Denis at epamail.epa.gov wrote:
>
> > I have a 1000 x 1000 grid of categorical values (nine of these) and
want
> > to compute and plot a correlogram. Function cell2nb() will take a
while
> > it appears but if that succeeds then can methods of sp.correlogram()
be
> > used on categorical data? What are "style" options in
sp.correlogram()?
> >
>
> Sorry, could you say how a correlogram might be constructed for
> categorical values (eg. land cover)? Wouldn't join counts be a more
> natural choice? joincoint.multi() in spdep has a Jtot value of total
> different category joins, so using that with different lags of a
cell2nb()
> neighbour list might work. However, the 1M cell grid is pretty large
for
> cell2nb(), using dnearneigh() on cell centres may be faster and scale
> better, (and other possibilities should exist) and nblag() will be
> definitely sub-optimal in this setting. For join counts, the "B"
weights
> style is the obvious one to chose. Note that joincoint.multi() is not
> coded in C.
>
> This would need trying on a small subset and scaling up - I think that
> alternative routes to constructing the lagged neighbour lists would be
> preferable.
>
> Roger
>
I wrote a short function to compute proportion agreement between classes
at different lags, R code below. Because it's a brute force O(n^2)
algorithm, it takes way too long in R for datasets with more than some
10's of points. So I wrote a C version that took about 20 hours for 1M
points on a 2GHz linux computer. Subsampling from the 1000x1000 image
reduced time considerably, naturally.
I haven't looked at Cliff and Ord or other texts on these kinds of
statistics for a long time and I don't know whether this simple
proportion agreement measure is closely related to a multinomial
joincount or not. And so I don't know what the test for non-randomness
is or what a variance measure is, etc. It does seem to portray the
spatial pattern in agreement though.
isotaxogram <- function (pts, nlags=10, maxdist)
# Calculate proportion agreement of categorical point data
# at successive lags
#
# pts matrix or data frame with columns/names
# "x", "y", "val"
# nlags number of lags in the output function
# maxdist maximum distance over which to compute the function;
# needs to be specified
{
agree <- num <- lag <- rep (0, nlags)
for (i in 1:(nrow(pts)-1))
{
x1 <- pts[i, "x"]
y1 <- pts[i, "y"]
for (j in (i+1):nrow(pts))
{
x2 <- pts[j, "x"]
y2 <- pts[j, "y"]
d <- sqrt ((x2-x1)^2 + (y2-y1)^2)
d <- round (d / maxdist * nlags)
a <- as.integer (pts[i, "val"] == pts[j, "val"])
if (d %in% 1:nlags)
{
agree[d] <- agree[d] + a
num[d] <- num[d] + 1
}
}
}
for (k in 1:nlags)
{
lag[k] <- k * (maxdist / nlags)
if (num[k] != 0) agree[k] <- agree[k] / num[k]
}
return (list (agree=agree, lag=lag, num=num))
}
More information about the R-sig-Geo
mailing list