[R-sig-Geo] Large dataset on Ripley's K function

Dan Putler dan.putler at sauder.ubc.ca
Fri Nov 23 18:38:17 CET 2007


Hi Sisi,

A three continent study area is very large. It strikes me that border
effects would likely matter, so would probably need to be corrected for.
Although, what the correct border definition is will depend on what your
data relate to. Consequently, the natural question to ask is what kinds
of objects do your points represent?

Dan

On Fri, 2007-23-11 at 15:12 +0100, Sisi wrote:
> Dear all,
>  
> Thanks for your quick reply!
>  
> For Roger, I checked the sessioninfo and mine is same as yours. My
> object size is indeed quite big, it is 275608. So I simplified the
> polygon and erased all the small islands, only hold the main land of
> Asia, Europe and Africa. After I loaded the polygon window and points,
> I checked the object size. It becomes  86768, which seems quite ok
> (compared with yours 57760). I ran the function again, and still
> doesn't work, the code and errors are as follow:
>  
> sessionInfo()
> R version 2.6.0 (2007-10-03) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base     
> 
> other attached packages:
> [1] RColorBrewer_1.0-2 spatstat_1.12-1    mgcv_1.3-27
> maptools_0.6-19    sp_0.9-16         
> [6] foreign_0.8-23    
> 
> loaded via a namespace (and not attached):
> [1] grid_2.6.0     lattice_0.16-5 tools_2.6.0   
> 
> 
> #Enlarge the memory
> memory.limit(size=4095)
> # creat polygon to be used as a window in ppp
> library(maptools)
> library(sp)
> library(spatstat)
> a <- readShapePoly("D:/5Bird/Result/New2Project.shp"[1]) 
> plot(a)
> 
> # Define the window
> W <- as(as(a, "SpatialPolygons"), "owin")
> plot(W)
> 
> # Import point data and creat ppp
> cases <- coordinates(readShapePoints("D:/5Bird/Result/BirdPoint.shp"))
> str(cases)
> 
> #num [1:3345, 1:2] 5431396 5305812 5343499 1975828 2128985 ...
>  #- attr(*, "dimnames")=List of 2
>   #..$ : NULL
>   #..$ : chr [1:2] "coords.x1" "coords.x2"
> 
> colnames(cases)<-c("x","y") 
> p1 <- ppp(cases[,1], cases[,2], window=W)
> plot(p1)
> points(cases, pch=20,col="red")
> 
> object.size(p1)
> #[1] 86768
> 
> 
> #Ripley's K function
> K <- Kest(p1,nlarge=Inf)
> #Error: cannot allocate vector of size 38.6 Mb
> plot(K)
> #Generate envelope 
> en <- envelope(p1, fun=Kest(p1,nlarge=3500), nsim=10) 
> #Error: cannot allocate vector of size 38.6 Mb
> plot(en, main="Envelopes of K function based on CSR")
>  
> For the question about the geographical coordinates, the projection
> system I'm using is Equidistant Cylindrical , the unit is meter. Just
> considering this projection has no distortion in distance.
>  
> For Don, when I set nlarge=3500, the error was: "Error: cannot
> allocate vector of size 381.8 Mb". When I set nlarge=Inf, the error
> became:"Error: cannot allocate vector of size 38.6 Mb". I cannot
> figure out the cause of the errors. When you mentioned edge
> correction, I get a question to consult. Do I still need edge
> correction if my study area covers Asia, Europe and Africa continents.
> I mean the boundary is the real boundary, and beyond it is the sea.
>  
> Thank you for your time and best regards,
> Sisi
>  
>  
>  
> On Thu, 22 Nov 2007, Roger Bivand  wrote:
>  
> This number is not large in itself, and:
> 
> set.seed(1)
> xy <- runifpoint(3500)
> res <- Kest(xy, nlarge=3500)
> 
> works without any problems on a 1GB system:
> 
> > sessionInfo() 
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=Norwegian (Bokmål)_Norway.1252;LC_CTYPE=Norwegian
> (Bokmål)_Norway.1252;LC_MONETARY=Norwegian
> (Bokmål)_Norway.1252;LC_NUMERIC=C;LC_TIME=Norwegian
> (Bokmål)_Norway.1252 
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] spatstat_1.12-3 mgcv_1.3-29
> 
> (it is always helpful to include the output of sessionInfo()) 
> 
> However note that here:
> 
> > xy
>  planar point pattern: 3500 points
> window: rectangle = [0, 1] x [0, 1] units
> > object.size(xy)
> [1] 57760
> 
> only has the simplest window, and my guess is that your window mask
> is 
> much richer (continental shorelines? raster landmass mask?). If so,
> edge
> correction will probably involve much more memory use. If you are
> using a
> vector shoreline, and this is the reason for the problems, have you
> tried 
> using a raster mask instead?
> 
> I have tried using coarse GSHHS shorelines without difficulties
> (Rgshhs in
> maptools), but very possibly with a shoreline with too many details,
> you
> might see problems, or with a raster mask with too high resolution. 
> 
> Have you considered the problems involved in using geographical
> coordinates?
> 
> Roger
> 
> On 22/11/2007, Putler, Dan <dan.putler at sauder.ubc.ca> wrote:
>         Hi Sisi,
>         
>         By setting nlarge to 3500 you are creating problems (given
>         that you have just over 3300 observations) since you may be
>         using either the isotropic or Ripley's method for border
>         correction. You want to use the border method (which is what
>         was used when nlarge was at its original value of 3000). To
>         quote the documentation of the function: 
>         
>         If the point pattern X contains more than about 3000 points,
>         the isotropic and translation edge cor-
>         rections can be computationally prohibitive. The computations
>         for the border method are much
>         faster, and are statistically efficient when there are large
>         numbers of points. Accordingly, if the 
>         number of points in X exceeds the threshold nlarge, then only
>         the border correction will be com-
>         puted. Setting nlarge=Inf will prevent this from happening.
>         Setting nlarge=0 is equivalent
>         to selecting only the border correction with
>         correction="border". 
>         
>         My guess is that Roger's suggested approach of simplifying the
>         boundaries is correct (my experience is that his suggestions
>         are always on target), which would allow you to take advantage
>         of the other methods, but the using the border method only can
>         be justified in your case (and a lot easier to implement). 
>         
>         Dan
>         
>         
> 
-- 
Dan Putler
Sauder School of Business
University of British Columbia




More information about the R-sig-Geo mailing list