[R-sig-Geo] correlogram for categorical data
Roger Bivand
Roger.Bivand at nhh.no
Sun Oct 22 16:54:29 CEST 2006
On Fri, 20 Oct 2006 White.Denis at epamail.epa.gov wrote:
> Roger Bivand <Roger.Bivand at nhh.no> wrote on 2006-10-19 00:05:31:
>
> > On Wed, 18 Oct 2006 White.Denis at epamail.epa.gov wrote:
> >
> > > I have a 1000 x 1000 grid of categorical values (nine of these) and
> want
> > > to compute and plot a correlogram. Function cell2nb() will take a
> while
> > > it appears but if that succeeds then can methods of sp.correlogram()
> be
> > > used on categorical data? What are "style" options in
> sp.correlogram()?
> > >
> >
> > Sorry, could you say how a correlogram might be constructed for
> > categorical values (eg. land cover)? Wouldn't join counts be a more
> > natural choice? joincoint.multi() in spdep has a Jtot value of total
> > different category joins, so using that with different lags of a
> cell2nb()
> > neighbour list might work. However, the 1M cell grid is pretty large
> for
> > cell2nb(), using dnearneigh() on cell centres may be faster and scale
> > better, (and other possibilities should exist) and nblag() will be
> > definitely sub-optimal in this setting. For join counts, the "B"
> weights
> > style is the obvious one to chose. Note that joincoint.multi() is not
> > coded in C.
> >
> > This would need trying on a small subset and scaling up - I think that
>
> > alternative routes to constructing the lagged neighbour lists would be
>
> > preferable.
> >
> > Roger
> >
>
> I wrote a short function to compute proportion agreement between classes
> at different lags, R code below. Because it's a brute force O(n^2)
> algorithm, it takes way too long in R for datasets with more than some
> 10's of points. So I wrote a C version that took about 20 hours for 1M
> points on a 2GHz linux computer. Subsampling from the 1000x1000 image
> reduced time considerably, naturally.
>
> I haven't looked at Cliff and Ord or other texts on these kinds of
> statistics for a long time and I don't know whether this simple
> proportion agreement measure is closely related to a multinomial
> joincount or not. And so I don't know what the test for non-randomness
> is or what a variance measure is, etc. It does seem to portray the
> spatial pattern in agreement though.
It would be the non-free sampling as in joincount.test() in spdep, or
possibly free sampling, which would be simpler. These are cumulated
same-color tests, so som adaptation would be needed. This isn't the
biggest problem, though, since the resolution of the grid and the steps in
the "joincountogram" will get mixed up - very high join counts at small
distances when the grid resolution is small compared the the effective
scale of variation of the phenomena.
Roger
>
> isotaxogram <- function (pts, nlags=10, maxdist)
>
> # Calculate proportion agreement of categorical point data
> # at successive lags
> #
> # pts matrix or data frame with columns/names
> # "x", "y", "val"
> # nlags number of lags in the output function
> # maxdist maximum distance over which to compute the function;
> # needs to be specified
>
> {
> agree <- num <- lag <- rep (0, nlags)
> for (i in 1:(nrow(pts)-1))
> {
> x1 <- pts[i, "x"]
> y1 <- pts[i, "y"]
> for (j in (i+1):nrow(pts))
> {
> x2 <- pts[j, "x"]
> y2 <- pts[j, "y"]
> d <- sqrt ((x2-x1)^2 + (y2-y1)^2)
> d <- round (d / maxdist * nlags)
> a <- as.integer (pts[i, "val"] == pts[j, "val"])
> if (d %in% 1:nlags)
> {
> agree[d] <- agree[d] + a
> num[d] <- num[d] + 1
> }
> }
> }
> for (k in 1:nlags)
> {
> lag[k] <- k * (maxdist / nlags)
> if (num[k] != 0) agree[k] <- agree[k] / num[k]
> }
> return (list (agree=agree, lag=lag, num=num))
> }
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list