[R-sig-Geo] Odd behavior of dismo's extract function

Nick Matzke matzke at nimbios.org
Mon Jul 25 04:35:04 CEST 2016


I am on R 3.3.1 and don't get the huge time difference.  There is a tiny
decrease though from 250 to 251:

=============
wd = "~/Downloads/extract_weirdness/"
setwd(wd)

library(raster)
library(dismo)


extract.test <- function(env, N){
      extract(env, randomPoints(env, N))
}

env.files <- list.files(path = ".", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)


system.time(extract.test(env, 250))

   user  system elapsed
  1.455   0.043   1.492
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude


system.time(extract.test(env, 251))   user  system elapsed
  1.137   0.033   1.158
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude
=============

...but I won't worry about it myself.  Thanks for the solution though, that
was weird behavior!
Nick




On Mon, Jul 25, 2016 at 12:29 PM, Dan Warren <dan.l.warren at gmail.com> wrote:

> Updating to R 3.3.1 fixed it.  Thanks!  Still baffled as to why the sudden
> dropoff between 250 and 251, but as long as it's working all is well.
>
> Cheers!
>
>
> On Mon, Jul 25, 2016 at 12:24 PM, Dan Warren <dan.l.warren at gmail.com>
> wrote:
>
> > How very odd.  I'm using R 3.3.0, but as far as I can tell I'm using the
> > same package versions as you.  I've tried this on two machines (12 core
> Mac
> > Pro and an older Macbook Pro) and I'm getting the same phenomenon on
> both.
> > Could it be a weird OSX thing?  I'll try updating R and then if it still
> > persists I'll bootcamp over into Windows and see if it's happening for me
> > there.
> >
> >
> > My session info (sorry for not including that the first time):
> >
> > Session info
> >
> ----------------------------------------------------------------------------------------------------------------------------------------
> >  setting  value
> >  version  R version 3.3.0 (2016-05-03)
> >  system   x86_64, darwin13.4.0
> >  ui       RStudio (0.99.491)
> >  language (EN)
> >  collate  en_AU.UTF-8
> >  tz       Australia/Sydney
> >  date     2016-07-25
> >
> > Packages
> >
> --------------------------------------------------------------------------------------------------------------------------------------------
> >  package    * version date       source
> >  colorspace   1.2-6   2015-03-11 CRAN (R 3.3.0)
> >  devtools     1.12.0  2016-06-24 CRAN (R 3.3.0)
> >  digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
> >  dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.0)
> >  ENMTools   * 0.1     2016-07-25 local
> >  ggplot2    * 2.1.0   2016-03-01 CRAN (R 3.3.0)
> >  gridExtra  * 2.2.1   2016-02-29 CRAN (R 3.3.0)
> >  gtable       0.2.0   2016-02-26 CRAN (R 3.3.0)
> >  highr        0.6     2016-05-09 CRAN (R 3.3.0)
> >  knitr      * 1.13    2016-05-09 CRAN (R 3.3.0)
> >  lattice      0.20-33 2015-07-14 CRAN (R 3.3.0)
> >  memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
> >  munsell      0.4.3   2016-02-13 CRAN (R 3.3.0)
> >  plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.0)
> >  raster     * 2.5-8   2016-06-02 CRAN (R 3.3.0)
> >  Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.0)
> >  rgeos      * 0.3-19  2016-04-04 CRAN (R 3.3.0)
> >  rJava        0.9-8   2016-01-07 CRAN (R 3.3.0)
> >  scales       0.4.0   2016-02-26 CRAN (R 3.3.0)
> >  sp         * 1.2-3   2016-04-14 CRAN (R 3.3.0)
> >  viridis    * 0.3.4   2016-03-12 CRAN (R 3.3.0)
> >  withr        1.0.2   2016-06-20 CRAN (R 3.3.0)
> >
> >
> > On Mon, Jul 25, 2016 at 12:15 PM, Michael Sumner <mdsumner at gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:
> >>
> >>> Just realized I pasted in the results backwards.  It should have been
> >>>
> >>> system.time(extract.test(env, 250))
> >>>
> >>>    user  system elapsed
> >>> 124.562   0.516 125.061
> >>>
> >>> system.time(extract.test(env, 251))
> >>>
> >>>    user  system elapsed
> >>>   2.807   0.084   2.891
> >>>
> >>>
> >>>
> >>>
> >> I don't see the effect.
> >>
> >> Perhaps it was fixed in recent version of raster?
> >>
> >> Please post reproducible details, I downloaded your data files to
> >> "test/testdata/" to try this.
> >>
> >> Cheers, Mike.
> >>
> >>
> >> library(raster)
> >> library(dismo)
> >> extract.test <- function(env, N){
> >>   extract(env, dismo::randomPoints(env, N))
> >> }
> >>
> >> env.files <- list.files(path = "test/testdata/", pattern = "pc",
> >> full.names =
> >>                           TRUE)
> >> env <- raster::stack(env.files)
> >>
> >> library(rbenchmark)
> >> benchmark(n250 = extract.test(env, 250),
> >>           n251 = extract.test(env, 251), replications = 4)
> >> # test replications elapsed relative user.self sys.self user.child
> >> sys.child
> >> # 1 n250            4    6.31    1.008      5.13     1.14         NA
> >>    NA
> >> # 2 n251            4    6.26    1.000      5.02     1.22         NA
> >>    NA
> >> devtools::session_info()
> >> # Session info
> >>
> -------------------------------------------------------------------------------------------------------------------------------
> >> #   setting  value
> >> # version  R version 3.3.1 Patched (2016-07-09 r70874)
> >> # system   x86_64, mingw32
> >> # ui       RStudio (0.99.1261)
> >> # language (EN)
> >> # collate  English_Australia.1252
> >> # tz       Australia/Hobart
> >> # date     2016-07-25
> >> #
> >> # Packages
> >>
> -----------------------------------------------------------------------------------------------------------------------------------
> >> #   package    * version date       source
> >> # devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.1)
> >> # digest       0.6.9   2016-01-08 CRAN (R 3.3.1)
> >> # dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.1)
> >> # evaluate     0.9     2016-04-29 CRAN (R 3.3.1)
> >> # htmltools    0.3.5   2016-03-21 CRAN (R 3.3.1)
> >> # knitr        1.13    2016-05-09 CRAN (R 3.3.1)
> >> # lattice      0.20-33 2015-07-14 CRAN (R 3.3.1)
> >> # magrittr     1.5     2014-11-22 CRAN (R 3.3.1)
> >> # memoise      1.0.0   2016-01-29 CRAN (R 3.3.1)
> >> # raster     * 2.5-8   2016-06-02 CRAN (R 3.3.1)
> >> # rbenchmark * 1.0.0   2012-08-30 CRAN (R 3.3.0)
> >> # Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.1)
> >> # rgdal        1.1-10  2016-05-12 CRAN (R 3.3.1)
> >> # rmarkdown    1.0.2   2016-07-19 Github (rstudio/rmarkdown at b65e177)
> >> # sp         * 1.2-3   2016-04-14 CRAN (R 3.3.1)
> >> # stringi      1.1.1   2016-05-27 CRAN (R 3.3.0)
> >> # stringr      1.0.0   2015-04-30 CRAN (R 3.3.1)
> >> # withr        1.0.2   2016-06-20 CRAN (R 3.3.1)
> >>
> >>
> >>
> >>
> >>
> >>> Dan Warren, Ph.D.
> >>> Department of Biology
> >>> Macquarie University
> >>> Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
> >>> Phone (US): 530-848-3809
> >>> Phone (Australia): 0468 696 897
> >>> Phone (Work): 02 9850 8587
> >>> Skype: dan.l.warren
> >>> Google Scholar
> >>> <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en>  Orcid
> >>> <http://orcid.org/0000-0002-8747-2451>  ResearcherID
> >>> <http://www.researcherid.com/rid/B-3821-2010>  Scopus
> >>> <http://www.scopus.com/authid/detail.url?authorId=7202133982>
> >>>
> >>>
> >>> On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com>
> >>> wrote:
> >>>
> >>> > This is not an error per se so much as just something very weird
> that I
> >>> > have noticed with a project I've been working on recently.  I'm
> >>> wondering
> >>> > if anyone here has any insight as to what may be causing this
> >>> behavior.  I
> >>> > haven't yet been able to duplicate it with simulated rasters (more
> >>> info on
> >>> > that below), but it appears very reliably with real environmental
> data
> >>> > including the PC rasters for Cuba I have hosted here:
> >>> >
> >>> > https://github.com/danlwarren/ENMTools/tree/master/test/testdata
> >>> >
> >>> > What's happening is this: if I go to extract data from those rasters
> >>> using
> >>> > occurrence points, the amount of time it takes increases very rapidly
> >>> up to
> >>> > exactly 250 points, and falls dramatically after that.  So
> dramatically
> >>> > that it takes over two minutes to extract data for 250 points but
> just
> >>> > under three seconds for 251.  I've established that it's not a
> >>> question of
> >>> > the points themselves being wonky, because it happens with random
> >>> points as
> >>> > well.
> >>> >
> >>> >
> >>> > extract.test <- function(env, N){
> >>> >       extract(env, randomPoints(env, N))
> >>> > }
> >>> >
> >>> > env.files <- list.files(path = "testdata/", pattern = "pc",
> full.names
> >>> =
> >>> > TRUE)
> >>> > env <- stack(env.files)
> >>> >
> >>> > system.time(extract.test(env, 250))
> >>> >
> >>> >    user  system elapsed
> >>> >   2.807   0.084   2.891
> >>> >
> >>> > system.time(extract.test(env, 251))
> >>> >
> >>> >    user  system elapsed
> >>> > 124.562   0.516 125.061
> >>> >
> >>> > numpoints,time
> >>> > 1,1.54
> >>> > 5,3.93
> >>> > 10,6.764
> >>> > 50,29.939
> >>> > 100,61.431
> >>> > 150,79.295
> >>> > 200,110.283
> >>> > 250,120.118
> >>> > 251,2.748
> >>> > 252,2.756
> >>> > 254,2.767
> >>> > 500,2.876
> >>> > 1000,3.153
> >>> >
> >>> > The data being extracted looks perfectly reasonable in all cases.
> It's
> >>> > not just these layers, either.  Although (as I mentioned above) I
> have
> >>> yet
> >>> > to come up with simulated rasters that show this behavior, I see this
> >>> > behavior for both of the sets of rasters for real environmental data
> >>> that
> >>> > I've tried.  The results above are from a PCA on Worldclim data for
> >>> Cuba,
> >>> > but I just tried them on some Climond data I've got for Australia and
> >>> I get
> >>> > the same behavior.  Those rasters are much larger, though, and a
> >>> result the
> >>> > times are longer; 251 points took about 43 seconds, whereas I just
> had
> >>> to
> >>> > give up and stop the 250 point extraction after about 30 minutes.
> >>> >
> >>> > As for those simulated rasters, I've tried the following:
> >>> >
> >>> > Plain grids of sequential numbers
> >>> > As above, but with a bunch of NAs added
> >>> > Filling the Cuban rasters with sequential numbers
> >>> > Filling the Cuban rasters with random numbers from a uniform (0,1)
> >>> > distribution
> >>> >
> >>> > None of those show this issue.  Anyone have any thoughts about what
> >>> might
> >>> > be going on here?
> >>> >
> >>> >
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> R-sig-Geo at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >> --
> >> Dr. Michael Sumner
> >> Software and Database Engineer
> >> Australian Antarctic Division
> >> 203 Channel Highway
> >> Kingston Tasmania 7050 Australia
> >>
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list