[R-sig-Geo] Count occurrences less memory expensive than superimpose function in several spatial objects

Sat Aug 22 13:04:13 CEST 2020

Alexandre Santos writes:

 > I'll like to read several shapefiles, count occurrences in the same
 > coordinate and create a final shapefile with a threshold number of
 > occurrences. I try to convert the shapefiles in ppp object (because I
 > have some part of my data set in shapefile and another in ppp objects)
 > and applied superimpose function [.... ]

The function 'superimpose' in the spatstat package is generic, with methods for 'ppp' and 'default'.

Your example code applies 'superimpose' to a list of objects of class 'ppp'.
This uses the method 'superimpose.ppp' which applies to objects of class 'ppp'
and constructs a new object of class 'ppp'. This task includes computing the appropriate "observation window"
(a component of the 'ppp' structure) from the observation windows of the input patterns.
There is an option in 'superimpose.ppp' to specify the observation window of the result.
You didn't use this option, so you're expecting the function 'superimpose.ppp' to calculate the
appropriate window. When you have many objects with complicated windows, this will take a lot of time.

To make this go faster you could simply extract the (x,y) coordinates of the objects using coords() or as.data.frame().
Then call 'superimpose' on these data frames which will invoke superimpose.default which will concatenate the
(x,y) coordinate lists very quickly.

If I understand correctly, your ultimate goal is to have a list of the unique (x,y) points and their multiplicities.

If you have already superimposed (concatenated) the x, y coordinate lists, then you can calculate the multiplicities
with 'table' , or the spatstat function 'uniquemap' (the latter function is extremely fast)

However, you don't need to concatenate all the coordinates of all the point patterns before calculating multiplicities.
In big data applications it would be more efficient to process each point pattern dataset first,
determining the unique (x,y) points and their multiplicities within each point pattern,
and then to merge the results from the different point patterns. Something like this,
if 'Plist' is your list of point patterns:

       # process each point pattern
        Vlist <- lapply(unname(Plist),
        function(P) {
               xy <- as.data.frame(P)[,c("x","y")]
               um <- uniquemap(xy)
               isun <- (um == seq_along(um))
               mul <- table(um)
               return(cbind(xy[isun, , drop=FALSE], m=mul))
       })
       # concatenate results from all patterns
       V <- do.call(rbind, Vlist)
       # find unique points
       um <- uniquemap(V[,c("x","y")])
       isun <- (um == seq_along(um))
       U <- V[isun, c("x", "y")]
       m <- tapply(V$m, factor(um), sum)

Then U contains the unique locations and m is the multiplicities.

Prof Adrian Baddeley HonDSc FAA

John Curtin Distinguished Professor

School of Electrical Engineering, Computing and Mathematical Sciences

Curtin University, Perth, Western Australia

I work Wednesdays and Thursdays only

	[[alternative HTML version deleted]]