[R-sig-Geo] Odd behavior of dismo's extract function

Dan Warren dan.l.warren at gmail.com
Mon Jul 25 03:34:43 CEST 2016


Just realized I pasted in the results backwards.  It should have been

system.time(extract.test(env, 250))

   user  system elapsed
124.562   0.516 125.061

system.time(extract.test(env, 251))

   user  system elapsed
  2.807   0.084   2.891



Dan Warren, Ph.D.
Department of Biology
Macquarie University
Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
Phone (US): 530-848-3809
Phone (Australia): 0468 696 897
Phone (Work): 02 9850 8587
Skype: dan.l.warren
Google Scholar
<https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en>  Orcid
<http://orcid.org/0000-0002-8747-2451>  ResearcherID
<http://www.researcherid.com/rid/B-3821-2010>  Scopus
<http://www.scopus.com/authid/detail.url?authorId=7202133982>

On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com> wrote:

> This is not an error per se so much as just something very weird that I
> have noticed with a project I've been working on recently.  I'm wondering
> if anyone here has any insight as to what may be causing this behavior.  I
> haven't yet been able to duplicate it with simulated rasters (more info on
> that below), but it appears very reliably with real environmental data
> including the PC rasters for Cuba I have hosted here:
>
> https://github.com/danlwarren/ENMTools/tree/master/test/testdata
>
> What's happening is this: if I go to extract data from those rasters using
> occurrence points, the amount of time it takes increases very rapidly up to
> exactly 250 points, and falls dramatically after that.  So dramatically
> that it takes over two minutes to extract data for 250 points but just
> under three seconds for 251.  I've established that it's not a question of
> the points themselves being wonky, because it happens with random points as
> well.
>
>
> extract.test <- function(env, N){
>       extract(env, randomPoints(env, N))
> }
>
> env.files <- list.files(path = "testdata/", pattern = "pc", full.names =
> TRUE)
> env <- stack(env.files)
>
> system.time(extract.test(env, 250))
>
>    user  system elapsed
>   2.807   0.084   2.891
>
> system.time(extract.test(env, 251))
>
>    user  system elapsed
> 124.562   0.516 125.061
>
> numpoints,time
> 1,1.54
> 5,3.93
> 10,6.764
> 50,29.939
> 100,61.431
> 150,79.295
> 200,110.283
> 250,120.118
> 251,2.748
> 252,2.756
> 254,2.767
> 500,2.876
> 1000,3.153
>
> The data being extracted looks perfectly reasonable in all cases.  It's
> not just these layers, either.  Although (as I mentioned above) I have yet
> to come up with simulated rasters that show this behavior, I see this
> behavior for both of the sets of rasters for real environmental data that
> I've tried.  The results above are from a PCA on Worldclim data for Cuba,
> but I just tried them on some Climond data I've got for Australia and I get
> the same behavior.  Those rasters are much larger, though, and a result the
> times are longer; 251 points took about 43 seconds, whereas I just had to
> give up and stop the 250 point extraction after about 30 minutes.
>
> As for those simulated rasters, I've tried the following:
>
> Plain grids of sequential numbers
> As above, but with a bunch of NAs added
> Filling the Cuban rasters with sequential numbers
> Filling the Cuban rasters with random numbers from a uniform (0,1)
> distribution
>
> None of those show this issue.  Anyone have any thoughts about what might
> be going on here?
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list