[R-sig-Geo] Problem with size of dataset

Thu Jul 16 10:45:59 CEST 2009

Dear Emmanuel,

I have the same problem. I either can not run processing with large data set in R or I can not even
load such data to R. Then, if I want to do any geostatistics, it takes forever. R (gstat/geoR) is
simply not that efficient with large spatial data as e.g. GIS software.

What you can definitively try is to subset your point data randomly by using e.g.:

> points.sub <- points[runif(length(points at data[[1]]))<0.1]  # 10% sample!

This will allow you to fit variograms etc.

Then, if you really want to interpolate all of your 157k points, you might consider using SAGA. For
example, you can also sub-set randomly points from a shape file:

# too many points; subset to 5% ("Split Shapes Layer Randomly" in SAGA):
> rsaga.geoprocessor(lib="shapes_tools", module=17, param=list(SHAPES="complete.shp",
A="part_A.shp", B="part_B.shp", PERCENT=5))
> ground.sample <- readOGR("part_B.shp","part_B")

# Learn more about geostatistics in SAGA:
> rsaga.get.modules(lib="geostatistics_kriging")
$geostatistics_kriging
   code                           name
1     0          Ordinary Kriging (VF)
2     1  Ordinary Kriging (VF, Global)
3     2         Universal Kriging (VF)
4     3 Universal Kriging (VF, Global)
5     4        Semivariogram (Dialog))
6     5               Ordinary Kriging
7     6      Ordinary Kriging (Global)
8     7              Universal Kriging
9     8     Universal Kriging (Global)
10   NA                           <NA>
11   NA                           <NA>

# Read the mask map:
> gridmaps <- readGDAL("maskmap.asc")
> cell.size <- gridmaps at grid@cellsize[[1]]
# Ordinary kriging in SAGA:
> rsaga.geoprocessor(lib="geostatistics_kriging", module=5, param=list(GRID="var_OK.sgrd",
SHAPES="var.shp", BVARIANCE=F, BLOCK=F, FIELD=1, BLOG=F, MODEL=1, TARGET=0, NPOINTS_MIN=10,
NPOINTS_MAX=60, NUGGET=rvgm.Pb$psill[1], SILL=1.65, RANGE=1238, MAXRADIUS=50000,
USER_CELL_SIZE=cell.size, USER_X_EXTENT_MIN=gridmaps at bbox[1,1]+cell.size/2,
USER_X_EXTENT_MAX=gridmaps at bbox[1,2]-cell.size/2, USER_Y_EXTENT_MIN=gridmaps at bbox[2,1]+cell.size/2,
USER_Y_EXTENT_MAX=gridmaps at bbox[2,2]-cell.size/2))
# the same way you can run regression-kriging/universal kriging;

You will soon find out that there is a big difference in the efficiency between SAGA and R - SAGA
will interpolate your 157k points within few minutes or less. On the other hand, SAGA has very very
limited geostatistical functionality (for example it can not fit variograms etc.), so what you
really need is a combination of SAGA and R!

Here are more examples:
http://geomorphometry.org/view_scripts.asp?id=24 

HTH,

T. Hengl
http://spatial-analyst.net 

> -----Original Message-----
> From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf
> Of Poizot Emmanuel
> Sent: Thursday, July 16, 2009 9:55 AM
> To: r-sig-geo at stat.math.ethz.ch
> Subject: [R-sig-Geo] Problem with size of dataset
> 
> Dear all,
> 
> I would like to perform a geostatistical analysis using R.
> To do so, I'm classicaly use geoR ou GSTAT packages.
> In the present case, I have a dataset of around 157000 locations where I
> have a value (depth).
> I've been not able to create both geodata or gstat valid R object
> because apparently of the size of the dataset.
> DOes anybody have an idear of how to conduct such study with such
> dataset size ?
> Regards
> 
> --
> Cordialement
> 
> ------------------------------------------------
> Emmanuel Poizot
> Cnam/Intechmer
> B.P. 324
> 50103 Cherbourg Cedex
> 
> Phone (Direct) : (00 33)(0)233887342
> Fax : (00 33)(0)233887339
> ------------------------------------------------