[R-sig-Geo] Odd behavior of dismo's extract function
Nick Matzke
matzke at nimbios.org
Mon Jul 25 04:35:04 CEST 2016
I am on R 3.3.1 and don't get the huge time difference. There is a tiny
decrease though from 250 to 251:
=============
wd = "~/Downloads/extract_weirdness/"
setwd(wd)
library(raster)
library(dismo)
extract.test <- function(env, N){
extract(env, randomPoints(env, N))
}
env.files <- list.files(path = ".", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)
system.time(extract.test(env, 250))
user system elapsed
1.455 0.043 1.492
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude
system.time(extract.test(env, 251)) user system elapsed
1.137 0.033 1.158
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude
=============
...but I won't worry about it myself. Thanks for the solution though, that
was weird behavior!
Nick
On Mon, Jul 25, 2016 at 12:29 PM, Dan Warren <dan.l.warren at gmail.com> wrote:
> Updating to R 3.3.1 fixed it. Thanks! Still baffled as to why the sudden
> dropoff between 250 and 251, but as long as it's working all is well.
>
> Cheers!
>
>
> On Mon, Jul 25, 2016 at 12:24 PM, Dan Warren <dan.l.warren at gmail.com>
> wrote:
>
> > How very odd. I'm using R 3.3.0, but as far as I can tell I'm using the
> > same package versions as you. I've tried this on two machines (12 core
> Mac
> > Pro and an older Macbook Pro) and I'm getting the same phenomenon on
> both.
> > Could it be a weird OSX thing? I'll try updating R and then if it still
> > persists I'll bootcamp over into Windows and see if it's happening for me
> > there.
> >
> >
> > My session info (sorry for not including that the first time):
> >
> > Session info
> >
> ----------------------------------------------------------------------------------------------------------------------------------------
> > setting value
> > version R version 3.3.0 (2016-05-03)
> > system x86_64, darwin13.4.0
> > ui RStudio (0.99.491)
> > language (EN)
> > collate en_AU.UTF-8
> > tz Australia/Sydney
> > date 2016-07-25
> >
> > Packages
> >
> --------------------------------------------------------------------------------------------------------------------------------------------
> > package * version date source
> > colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
> > devtools 1.12.0 2016-06-24 CRAN (R 3.3.0)
> > digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
> > dismo * 1.1-1 2016-06-16 CRAN (R 3.3.0)
> > ENMTools * 0.1 2016-07-25 local
> > ggplot2 * 2.1.0 2016-03-01 CRAN (R 3.3.0)
> > gridExtra * 2.2.1 2016-02-29 CRAN (R 3.3.0)
> > gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
> > highr 0.6 2016-05-09 CRAN (R 3.3.0)
> > knitr * 1.13 2016-05-09 CRAN (R 3.3.0)
> > lattice 0.20-33 2015-07-14 CRAN (R 3.3.0)
> > memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
> > munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
> > plyr * 1.8.4 2016-06-08 CRAN (R 3.3.0)
> > raster * 2.5-8 2016-06-02 CRAN (R 3.3.0)
> > Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0)
> > rgeos * 0.3-19 2016-04-04 CRAN (R 3.3.0)
> > rJava 0.9-8 2016-01-07 CRAN (R 3.3.0)
> > scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
> > sp * 1.2-3 2016-04-14 CRAN (R 3.3.0)
> > viridis * 0.3.4 2016-03-12 CRAN (R 3.3.0)
> > withr 1.0.2 2016-06-20 CRAN (R 3.3.0)
> >
> >
> > On Mon, Jul 25, 2016 at 12:15 PM, Michael Sumner <mdsumner at gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:
> >>
> >>> Just realized I pasted in the results backwards. It should have been
> >>>
> >>> system.time(extract.test(env, 250))
> >>>
> >>> user system elapsed
> >>> 124.562 0.516 125.061
> >>>
> >>> system.time(extract.test(env, 251))
> >>>
> >>> user system elapsed
> >>> 2.807 0.084 2.891
> >>>
> >>>
> >>>
> >>>
> >> I don't see the effect.
> >>
> >> Perhaps it was fixed in recent version of raster?
> >>
> >> Please post reproducible details, I downloaded your data files to
> >> "test/testdata/" to try this.
> >>
> >> Cheers, Mike.
> >>
> >>
> >> library(raster)
> >> library(dismo)
> >> extract.test <- function(env, N){
> >> extract(env, dismo::randomPoints(env, N))
> >> }
> >>
> >> env.files <- list.files(path = "test/testdata/", pattern = "pc",
> >> full.names =
> >> TRUE)
> >> env <- raster::stack(env.files)
> >>
> >> library(rbenchmark)
> >> benchmark(n250 = extract.test(env, 250),
> >> n251 = extract.test(env, 251), replications = 4)
> >> # test replications elapsed relative user.self sys.self user.child
> >> sys.child
> >> # 1 n250 4 6.31 1.008 5.13 1.14 NA
> >> NA
> >> # 2 n251 4 6.26 1.000 5.02 1.22 NA
> >> NA
> >> devtools::session_info()
> >> # Session info
> >>
> -------------------------------------------------------------------------------------------------------------------------------
> >> # setting value
> >> # version R version 3.3.1 Patched (2016-07-09 r70874)
> >> # system x86_64, mingw32
> >> # ui RStudio (0.99.1261)
> >> # language (EN)
> >> # collate English_Australia.1252
> >> # tz Australia/Hobart
> >> # date 2016-07-25
> >> #
> >> # Packages
> >>
> -----------------------------------------------------------------------------------------------------------------------------------
> >> # package * version date source
> >> # devtools * 1.12.0 2016-06-24 CRAN (R 3.3.1)
> >> # digest 0.6.9 2016-01-08 CRAN (R 3.3.1)
> >> # dismo * 1.1-1 2016-06-16 CRAN (R 3.3.1)
> >> # evaluate 0.9 2016-04-29 CRAN (R 3.3.1)
> >> # htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
> >> # knitr 1.13 2016-05-09 CRAN (R 3.3.1)
> >> # lattice 0.20-33 2015-07-14 CRAN (R 3.3.1)
> >> # magrittr 1.5 2014-11-22 CRAN (R 3.3.1)
> >> # memoise 1.0.0 2016-01-29 CRAN (R 3.3.1)
> >> # raster * 2.5-8 2016-06-02 CRAN (R 3.3.1)
> >> # rbenchmark * 1.0.0 2012-08-30 CRAN (R 3.3.0)
> >> # Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.1)
> >> # rgdal 1.1-10 2016-05-12 CRAN (R 3.3.1)
> >> # rmarkdown 1.0.2 2016-07-19 Github (rstudio/rmarkdown at b65e177)
> >> # sp * 1.2-3 2016-04-14 CRAN (R 3.3.1)
> >> # stringi 1.1.1 2016-05-27 CRAN (R 3.3.0)
> >> # stringr 1.0.0 2015-04-30 CRAN (R 3.3.1)
> >> # withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
> >>
> >>
> >>
> >>
> >>
> >>> Dan Warren, Ph.D.
> >>> Department of Biology
> >>> Macquarie University
> >>> Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
> >>> Phone (US): 530-848-3809
> >>> Phone (Australia): 0468 696 897
> >>> Phone (Work): 02 9850 8587
> >>> Skype: dan.l.warren
> >>> Google Scholar
> >>> <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en> Orcid
> >>> <http://orcid.org/0000-0002-8747-2451> ResearcherID
> >>> <http://www.researcherid.com/rid/B-3821-2010> Scopus
> >>> <http://www.scopus.com/authid/detail.url?authorId=7202133982>
> >>>
> >>>
> >>> On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com>
> >>> wrote:
> >>>
> >>> > This is not an error per se so much as just something very weird
> that I
> >>> > have noticed with a project I've been working on recently. I'm
> >>> wondering
> >>> > if anyone here has any insight as to what may be causing this
> >>> behavior. I
> >>> > haven't yet been able to duplicate it with simulated rasters (more
> >>> info on
> >>> > that below), but it appears very reliably with real environmental
> data
> >>> > including the PC rasters for Cuba I have hosted here:
> >>> >
> >>> > https://github.com/danlwarren/ENMTools/tree/master/test/testdata
> >>> >
> >>> > What's happening is this: if I go to extract data from those rasters
> >>> using
> >>> > occurrence points, the amount of time it takes increases very rapidly
> >>> up to
> >>> > exactly 250 points, and falls dramatically after that. So
> dramatically
> >>> > that it takes over two minutes to extract data for 250 points but
> just
> >>> > under three seconds for 251. I've established that it's not a
> >>> question of
> >>> > the points themselves being wonky, because it happens with random
> >>> points as
> >>> > well.
> >>> >
> >>> >
> >>> > extract.test <- function(env, N){
> >>> > extract(env, randomPoints(env, N))
> >>> > }
> >>> >
> >>> > env.files <- list.files(path = "testdata/", pattern = "pc",
> full.names
> >>> =
> >>> > TRUE)
> >>> > env <- stack(env.files)
> >>> >
> >>> > system.time(extract.test(env, 250))
> >>> >
> >>> > user system elapsed
> >>> > 2.807 0.084 2.891
> >>> >
> >>> > system.time(extract.test(env, 251))
> >>> >
> >>> > user system elapsed
> >>> > 124.562 0.516 125.061
> >>> >
> >>> > numpoints,time
> >>> > 1,1.54
> >>> > 5,3.93
> >>> > 10,6.764
> >>> > 50,29.939
> >>> > 100,61.431
> >>> > 150,79.295
> >>> > 200,110.283
> >>> > 250,120.118
> >>> > 251,2.748
> >>> > 252,2.756
> >>> > 254,2.767
> >>> > 500,2.876
> >>> > 1000,3.153
> >>> >
> >>> > The data being extracted looks perfectly reasonable in all cases.
> It's
> >>> > not just these layers, either. Although (as I mentioned above) I
> have
> >>> yet
> >>> > to come up with simulated rasters that show this behavior, I see this
> >>> > behavior for both of the sets of rasters for real environmental data
> >>> that
> >>> > I've tried. The results above are from a PCA on Worldclim data for
> >>> Cuba,
> >>> > but I just tried them on some Climond data I've got for Australia and
> >>> I get
> >>> > the same behavior. Those rasters are much larger, though, and a
> >>> result the
> >>> > times are longer; 251 points took about 43 seconds, whereas I just
> had
> >>> to
> >>> > give up and stop the 250 point extraction after about 30 minutes.
> >>> >
> >>> > As for those simulated rasters, I've tried the following:
> >>> >
> >>> > Plain grids of sequential numbers
> >>> > As above, but with a bunch of NAs added
> >>> > Filling the Cuban rasters with sequential numbers
> >>> > Filling the Cuban rasters with random numbers from a uniform (0,1)
> >>> > distribution
> >>> >
> >>> > None of those show this issue. Anyone have any thoughts about what
> >>> might
> >>> > be going on here?
> >>> >
> >>> >
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> R-sig-Geo at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >> --
> >> Dr. Michael Sumner
> >> Software and Database Engineer
> >> Australian Antarctic Division
> >> 203 Channel Highway
> >> Kingston Tasmania 7050 Australia
> >>
> >>
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list