[R-sig-Geo] Odd behavior of dismo's extract function
Michael Sumner
mdsumner at gmail.com
Mon Jul 25 04:15:02 CEST 2016
On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:
> Just realized I pasted in the results backwards. It should have been
>
> system.time(extract.test(env, 250))
>
> user system elapsed
> 124.562 0.516 125.061
>
> system.time(extract.test(env, 251))
>
> user system elapsed
> 2.807 0.084 2.891
>
>
>
>
I don't see the effect.
Perhaps it was fixed in recent version of raster?
Please post reproducible details, I downloaded your data files to
"test/testdata/" to try this.
Cheers, Mike.
library(raster)
library(dismo)
extract.test <- function(env, N){
extract(env, dismo::randomPoints(env, N))
}
env.files <- list.files(path = "test/testdata/", pattern = "pc", full.names
=
TRUE)
env <- raster::stack(env.files)
library(rbenchmark)
benchmark(n250 = extract.test(env, 250),
n251 = extract.test(env, 251), replications = 4)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 n250 4 6.31 1.008 5.13 1.14 NA
NA
# 2 n251 4 6.26 1.000 5.02 1.22 NA
NA
devtools::session_info()
# Session info
-------------------------------------------------------------------------------------------------------------------------------
# setting value
# version R version 3.3.1 Patched (2016-07-09 r70874)
# system x86_64, mingw32
# ui RStudio (0.99.1261)
# language (EN)
# collate English_Australia.1252
# tz Australia/Hobart
# date 2016-07-25
#
# Packages
-----------------------------------------------------------------------------------------------------------------------------------
# package * version date source
# devtools * 1.12.0 2016-06-24 CRAN (R 3.3.1)
# digest 0.6.9 2016-01-08 CRAN (R 3.3.1)
# dismo * 1.1-1 2016-06-16 CRAN (R 3.3.1)
# evaluate 0.9 2016-04-29 CRAN (R 3.3.1)
# htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
# knitr 1.13 2016-05-09 CRAN (R 3.3.1)
# lattice 0.20-33 2015-07-14 CRAN (R 3.3.1)
# magrittr 1.5 2014-11-22 CRAN (R 3.3.1)
# memoise 1.0.0 2016-01-29 CRAN (R 3.3.1)
# raster * 2.5-8 2016-06-02 CRAN (R 3.3.1)
# rbenchmark * 1.0.0 2012-08-30 CRAN (R 3.3.0)
# Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.1)
# rgdal 1.1-10 2016-05-12 CRAN (R 3.3.1)
# rmarkdown 1.0.2 2016-07-19 Github (rstudio/rmarkdown at b65e177)
# sp * 1.2-3 2016-04-14 CRAN (R 3.3.1)
# stringi 1.1.1 2016-05-27 CRAN (R 3.3.0)
# stringr 1.0.0 2015-04-30 CRAN (R 3.3.1)
# withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
> Dan Warren, Ph.D.
> Department of Biology
> Macquarie University
> Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
> Phone (US): 530-848-3809
> Phone (Australia): 0468 696 897
> Phone (Work): 02 9850 8587
> Skype: dan.l.warren
> Google Scholar
> <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en> Orcid
> <http://orcid.org/0000-0002-8747-2451> ResearcherID
> <http://www.researcherid.com/rid/B-3821-2010> Scopus
> <http://www.scopus.com/authid/detail.url?authorId=7202133982>
>
> On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com>
> wrote:
>
> > This is not an error per se so much as just something very weird that I
> > have noticed with a project I've been working on recently. I'm wondering
> > if anyone here has any insight as to what may be causing this behavior.
> I
> > haven't yet been able to duplicate it with simulated rasters (more info
> on
> > that below), but it appears very reliably with real environmental data
> > including the PC rasters for Cuba I have hosted here:
> >
> > https://github.com/danlwarren/ENMTools/tree/master/test/testdata
> >
> > What's happening is this: if I go to extract data from those rasters
> using
> > occurrence points, the amount of time it takes increases very rapidly up
> to
> > exactly 250 points, and falls dramatically after that. So dramatically
> > that it takes over two minutes to extract data for 250 points but just
> > under three seconds for 251. I've established that it's not a question
> of
> > the points themselves being wonky, because it happens with random points
> as
> > well.
> >
> >
> > extract.test <- function(env, N){
> > extract(env, randomPoints(env, N))
> > }
> >
> > env.files <- list.files(path = "testdata/", pattern = "pc", full.names =
> > TRUE)
> > env <- stack(env.files)
> >
> > system.time(extract.test(env, 250))
> >
> > user system elapsed
> > 2.807 0.084 2.891
> >
> > system.time(extract.test(env, 251))
> >
> > user system elapsed
> > 124.562 0.516 125.061
> >
> > numpoints,time
> > 1,1.54
> > 5,3.93
> > 10,6.764
> > 50,29.939
> > 100,61.431
> > 150,79.295
> > 200,110.283
> > 250,120.118
> > 251,2.748
> > 252,2.756
> > 254,2.767
> > 500,2.876
> > 1000,3.153
> >
> > The data being extracted looks perfectly reasonable in all cases. It's
> > not just these layers, either. Although (as I mentioned above) I have
> yet
> > to come up with simulated rasters that show this behavior, I see this
> > behavior for both of the sets of rasters for real environmental data that
> > I've tried. The results above are from a PCA on Worldclim data for Cuba,
> > but I just tried them on some Climond data I've got for Australia and I
> get
> > the same behavior. Those rasters are much larger, though, and a result
> the
> > times are longer; 251 points took about 43 seconds, whereas I just had to
> > give up and stop the 250 point extraction after about 30 minutes.
> >
> > As for those simulated rasters, I've tried the following:
> >
> > Plain grids of sequential numbers
> > As above, but with a bunch of NAs added
> > Filling the Cuban rasters with sequential numbers
> > Filling the Cuban rasters with random numbers from a uniform (0,1)
> > distribution
> >
> > None of those show this issue. Anyone have any thoughts about what might
> > be going on here?
> >
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list