[R-sig-Geo] Odd behavior of dismo's extract function

Michael Sumner mdsumner at gmail.com
Mon Jul 25 04:15:02 CEST 2016


On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:

> Just realized I pasted in the results backwards.  It should have been
>
> system.time(extract.test(env, 250))
>
>    user  system elapsed
> 124.562   0.516 125.061
>
> system.time(extract.test(env, 251))
>
>    user  system elapsed
>   2.807   0.084   2.891
>
>
>
>
I don't see the effect.

Perhaps it was fixed in recent version of raster?

Please post reproducible details, I downloaded your data files to
"test/testdata/" to try this.

Cheers, Mike.


library(raster)
library(dismo)
extract.test <- function(env, N){
  extract(env, dismo::randomPoints(env, N))
}

env.files <- list.files(path = "test/testdata/", pattern = "pc", full.names
=
                          TRUE)
env <- raster::stack(env.files)

library(rbenchmark)
benchmark(n250 = extract.test(env, 250),
          n251 = extract.test(env, 251), replications = 4)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 n250            4    6.31    1.008      5.13     1.14         NA
 NA
# 2 n251            4    6.26    1.000      5.02     1.22         NA
 NA
devtools::session_info()
# Session info
-------------------------------------------------------------------------------------------------------------------------------
#   setting  value
# version  R version 3.3.1 Patched (2016-07-09 r70874)
# system   x86_64, mingw32
# ui       RStudio (0.99.1261)
# language (EN)
# collate  English_Australia.1252
# tz       Australia/Hobart
# date     2016-07-25
#
# Packages
-----------------------------------------------------------------------------------------------------------------------------------
#   package    * version date       source
# devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.1)
# digest       0.6.9   2016-01-08 CRAN (R 3.3.1)
# dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.1)
# evaluate     0.9     2016-04-29 CRAN (R 3.3.1)
# htmltools    0.3.5   2016-03-21 CRAN (R 3.3.1)
# knitr        1.13    2016-05-09 CRAN (R 3.3.1)
# lattice      0.20-33 2015-07-14 CRAN (R 3.3.1)
# magrittr     1.5     2014-11-22 CRAN (R 3.3.1)
# memoise      1.0.0   2016-01-29 CRAN (R 3.3.1)
# raster     * 2.5-8   2016-06-02 CRAN (R 3.3.1)
# rbenchmark * 1.0.0   2012-08-30 CRAN (R 3.3.0)
# Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.1)
# rgdal        1.1-10  2016-05-12 CRAN (R 3.3.1)
# rmarkdown    1.0.2   2016-07-19 Github (rstudio/rmarkdown at b65e177)
# sp         * 1.2-3   2016-04-14 CRAN (R 3.3.1)
# stringi      1.1.1   2016-05-27 CRAN (R 3.3.0)
# stringr      1.0.0   2015-04-30 CRAN (R 3.3.1)
# withr        1.0.2   2016-06-20 CRAN (R 3.3.1)





> Dan Warren, Ph.D.
> Department of Biology
> Macquarie University
> Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
> Phone (US): 530-848-3809
> Phone (Australia): 0468 696 897
> Phone (Work): 02 9850 8587
> Skype: dan.l.warren
> Google Scholar
> <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en>  Orcid
> <http://orcid.org/0000-0002-8747-2451>  ResearcherID
> <http://www.researcherid.com/rid/B-3821-2010>  Scopus
> <http://www.scopus.com/authid/detail.url?authorId=7202133982>
>
> On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com>
> wrote:
>
> > This is not an error per se so much as just something very weird that I
> > have noticed with a project I've been working on recently.  I'm wondering
> > if anyone here has any insight as to what may be causing this behavior.
> I
> > haven't yet been able to duplicate it with simulated rasters (more info
> on
> > that below), but it appears very reliably with real environmental data
> > including the PC rasters for Cuba I have hosted here:
> >
> > https://github.com/danlwarren/ENMTools/tree/master/test/testdata
> >
> > What's happening is this: if I go to extract data from those rasters
> using
> > occurrence points, the amount of time it takes increases very rapidly up
> to
> > exactly 250 points, and falls dramatically after that.  So dramatically
> > that it takes over two minutes to extract data for 250 points but just
> > under three seconds for 251.  I've established that it's not a question
> of
> > the points themselves being wonky, because it happens with random points
> as
> > well.
> >
> >
> > extract.test <- function(env, N){
> >       extract(env, randomPoints(env, N))
> > }
> >
> > env.files <- list.files(path = "testdata/", pattern = "pc", full.names =
> > TRUE)
> > env <- stack(env.files)
> >
> > system.time(extract.test(env, 250))
> >
> >    user  system elapsed
> >   2.807   0.084   2.891
> >
> > system.time(extract.test(env, 251))
> >
> >    user  system elapsed
> > 124.562   0.516 125.061
> >
> > numpoints,time
> > 1,1.54
> > 5,3.93
> > 10,6.764
> > 50,29.939
> > 100,61.431
> > 150,79.295
> > 200,110.283
> > 250,120.118
> > 251,2.748
> > 252,2.756
> > 254,2.767
> > 500,2.876
> > 1000,3.153
> >
> > The data being extracted looks perfectly reasonable in all cases.  It's
> > not just these layers, either.  Although (as I mentioned above) I have
> yet
> > to come up with simulated rasters that show this behavior, I see this
> > behavior for both of the sets of rasters for real environmental data that
> > I've tried.  The results above are from a PCA on Worldclim data for Cuba,
> > but I just tried them on some Climond data I've got for Australia and I
> get
> > the same behavior.  Those rasters are much larger, though, and a result
> the
> > times are longer; 251 points took about 43 seconds, whereas I just had to
> > give up and stop the 250 point extraction after about 30 minutes.
> >
> > As for those simulated rasters, I've tried the following:
> >
> > Plain grids of sequential numbers
> > As above, but with a bunch of NAs added
> > Filling the Cuban rasters with sequential numbers
> > Filling the Cuban rasters with random numbers from a uniform (0,1)
> > distribution
> >
> > None of those show this issue.  Anyone have any thoughts about what might
> > be going on here?
> >
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
-- 
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list