[R] R: gstat problem with lidar data

Wed Jul 16 23:22:54 CEST 2008

On Wednesday 16 July 2008, Alessandro wrote:
> Hey Dylan,
>
> Thank you. I wish to test for my PhD: TIN (ok, with Arcmap), IDW (ok, with
> Arcmap) and kriging model (and other if it is possible) to create DSM and
> DEM, and DCM (DSM-DEM). I tried with gstat in IDRISI, but my PC outs of
> memory.
> I wish improve in R the gstat to develop map surface (in grid format for
> idrisi or arcmap). Unfortunately I have the same problem in R (out of
> memory), because the dataset is big. Therefore I wish create a random sub
> sampling set by 5000,000.00 over points.
> I show you my code (sorry I am a brand new in R)
>
> Data type (in *.txt format)
>
> X		y		X
> .......	.......	........
> .......	.......	........
>
> testground <- read.table
> (file="c:/work_LIDAR_USA/R_kriging/ground26841492694149.txt", header=T,
> sep=" ")
> summary (testground)
> plot(testground[,1],testground[,2])
> library (sp)
> class (testground)
> coordinates (testground)=~X+Y
> library (gstat)
> class (testground)
> V <- variogram(z~1, testground)
>
> When I arrive in this step appear "out of memory"
>
> If do you help me, it's a very pleasure because I stopped my work.
>
> Ale
>

Hi Ale. Please remember to CC the list next time.

Since R is memory-bound (for the most part), you should be summarizing your 
data first, then loading into R. 

If you can install GRASS, I would highly recommend using the r.in.xyz command 
to pre-grid your data to a reasonable cell size, such that the resulting 
raster will fit into memory.

If you cannot, and can somehow manage to get the raw data into R, sampling 
random rows would work.

# make some data:
x <- 1:100000

# just some of the data
sample(x, 100)

# use this idea to extract x,y,z triplets
# from some fake data:
d <- data.frame(x=rnorm(100), y=rnorm(100), z=rnorm(100))

# select 10 random rows:
rand_rows <- sample(1:nrow(d), 10)

# just the selected rows:
d.small <- d[rand_rows, ]

keep in mind you will need enough memory to contain the original data AND your 
subset data. trash the original data once you have the subset data with rm().

As for the statistical implications of randomly sampling a point cloud for 
variogram analysis-- someone smarter than I may be helpful.

Cheers,

Dylan

>
>
> -----Messaggio originale-----
> Da: Dylan Beaudette [mailto:dylan.beaudette at gmail.com]
> Inviato: mercoledì 16 luglio 2008 12.45
> A: r-help at r-project.org
> Cc: Alessandro
> Oggetto: Re: [R] gstat problem with lidar data
>
> On Wednesday 16 July 2008, Alessandro wrote:
> > Hey,
> >
> >
> >
> > I am a PhD student in forestry science, and I am a brand new in R. I am
> > working with lidar data (cloud points with X, Y and Z value). I wish to
> > create a spatial map with kriging form points cloud. My problem is the
> > Big data-set (over 5,000,000.00 points) and I always went out of memory.
> >
> >
> >
> > Is there a script to create un subset or modify the radius of variogram?
>
> Do you have any reason to prefer kriging over some other, less intensive
> method such as RST (regularized splines with tension)?
>
> Check out GRASS or GMT for ideas on how to grid such a massive point set.
> Specifically the r.in.xyz and v.surf.rst modules from GRASS.
>
> Cheers,

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341