[R-sig-Geo] cokriging question

Edzer Pebesma edzer.pebesma at uni-muenster.de
Fri Sep 19 18:57:20 CEST 2008


Dear Tom,

Hengl, T. wrote:
>  
> Dear Edzer,
>  
> I do have a similar problem (extensive computation time, memory limits). I would like to run interpolation of 60,000 points for 500,000 new points using OK and nmax of 30 closest points (I am using a windowsxp machine with 2GB RAM).
>  
> But I have doubts that it can ever run in less than few hours in gstat. I mean, at each new location, the algorithm has to derive distances to all locations (60,000^2 matrix) and then find the 30 closest ones. There is no more intelligent way to pick up the closest points - or is there?
>   
There is, and it is being used: the short name is spatial index and 
quadtree; you'll find links and explanation on
http://www.gstat.org/manual/node8.html#SECTION00351000000000000000

Both computers & geosciences papers on gstat mention this.

To see the effect of this, try to enlarge split to a number larger than 
the number of observations, as in

krige(log(zinc)~1, meuse, meuse[1,], vgm(1, "Exp", 300), set = 
list(split=1000))

Here, the bucket size is 1000, and all observations will be ranked 
according to distance to the prediction location (which does not require 
an n^2 matrix), using quick sort.

Other software, such as GSLIB (iirc) put a course regular grid over the 
domain to speed up neighbourhood selection.
--
Edzer

>  
> Thanx,
>  
> Tom
>  
>
> ________________________________
>
> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Dave Depew
> Sent: Fri 9/19/2008 5:00 PM
> To: Edzer Pebesma
> Cc: Chris Taylor; r-sig-geo at stat.math.ethz.ch
> Subject: Re: [R-sig-Geo] cokriging question
>
>
>
> Thanks for the quick responses.
> I've often done global OK or UK that can take ~ 2-3 days to complete. I
> always assumed that it was b/c the matrices were so large. Looking at
> task manager indicated that Rgui.exe only consumed ~ 800 Mb of RAM
> during the processes.
> I'll try it by passing the maxdist to gstat() rather than predict.gstat().
> The only other time I encounter the memory.c line is when I neglect to
> remove duplicate observations.
>
> I think I only should have at most 500 or so points within a 100-200m
> maxdist...
>
>
>
>
> Edzer Pebesma wrote:
>   
>> Good morning Dave (late afternoon here),
>>
>> Chris Taylor wrote:
>>     
>>> Good morning Edzer and Dave,
>>> Thanks for bringing up this point.  I had a similar issue recently
>>> using krige().  Observations at 5800 locations, attempting to krige()
>>> predictions at 112,000 locations resulted in the same "memory.c"
>>> error message.  Reducing predicted locations to <<50,000 and reducing
>>> max.dist seemed to help, but the predictions still took a very long
>>> time (>2 hours).  (Running winxp with 4GB memory.)
>>> Can you clarify your suspicion regarding the "lack of standardization
>>> of coordinates"?
>>>       
>> In this message, a trend was modelled based on x and y coordinates, as
>> follows:
>>       x                y                DN4          indicator4
>> Min.   :670462   Min.   :4215236   Min.   :18.00   Min.   :0.0000
>> 1st Qu.:670683   1st Qu.:4215456   1st Qu.:24.00   1st Qu.:0.0000
>> Median :670904   Median :4215677   Median :32.50   Median :0.0000
>> Mean   :670904   Mean   :4215677   Mean   :43.26   Mean   :0.4795
>> 3rd Qu.:671125   3rd Qu.:4215898   3rd Qu.:64.00   3rd Qu.:1.0000
>> Max.   :671346   Max.   :4216119   Max.   :87.00   Max.   :1.0000
>>     
>> g<-gstat(id="indicator6",formula=indicator6~x+y+x*y+sqrt(x)+sqrt(y),location=~x+y,data=band6.data,...
>>
>>
>> computing the x*y will give numbers many orders of magnitude larger
>> than sqrt(x),or the intercept. The advice is usually to (somewhat)
>> standardize coordinates before using them as a trend. But I doubt this
>> helps you very much.
>>
>> I find it hard to consider > 2 hours as a very long time before I know
>> all the details (e.g. how many points were there within your
>> maxdist?), the reason why you want an instant answer, and preferably
>> have heard a comparison with other software. If you then tell me what
>> your budget is, I might come up with possible solutions (starting very
>> cheap, e.g. look at demo(snow) in package gstat, or use an OS that can
>> assign this 4Gb to a single process).
>> --
>> Edzer
>>     
>>> Chris
>>>
>>> Edzer Pebesma wrote:
>>>       
>>>> Dave,
>>>>
>>>> 12000 observations fit, in the c representation, in less than 1 Mb
>>>> (64 bytes per observation).
>>>>
>>>> The issue is that you think that passing maxdist to predict.gstat
>>>> has an effect. It doesn't; you need to pass it to function gstat().
>>>>
>>>> The same thing happened in this
>>>> https://stat.ethz.ch/pipermail/r-sig-geo/2008-September/004182.html
>>>> message, where nmax was passed to predict.gstat, and simulation took
>>>> forever. The other issue in that question was, I suspect, lack of
>>>> standardization of coordinates, used in a trend surface.
>>>> --
>>>> Edzer
>>>>
>>>> Dave Depew wrote:
>>>>         
>>>>> Is there a limit to the # of observations or size of file that can
>>>>> be co-kriged in gstat?
>>>>> I have a ~12000 observation data set (2 variables), the variograms,
>>>>> cross variogram and lmc are fit well, and co-kriging starts ok
>>>>>
>>>>> Linear Model of Coregionalization found. Good.
>>>>> [using ordinary cokriging]
>>>>>
>>>>> then immediately outputs
>>>>>
>>>>> "memory.c", line 57: can't allocate memory in function m_get()
>>>>> Error in predict.gstat(fit.ck, newdata = EcoSAV.grid, maxdist = 100) :
>>>>>  m_get
>>>>>
>>>>> Iv tried different maxdist from 10 to 1000, with exactly the same
>>>>> result.
>>>>> I recently upgraded my RAM to 4Gb and flipped the windows XP /3GB
>>>>> switch.
>>>>>
>>>>>
>>>>>           
>
>
> --
> David Depew
> PhD Candidate
> Department of Biology
> University of Waterloo
> 200 University Ave W.
> Waterloo, ON. Canada
> N2L 3G1
>
> (T) 1-519-888-4567 x33895
> (F) 1-519-746-0614
>
> http://www.science.uwaterloo.ca/~ddepew
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>   

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster,
Weseler Straße 253, 48151 Münster, Germany.  Phone: +49 251
8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de/
http://www.springer.com/978-0-387-78170-9




More information about the R-sig-Geo mailing list