[R-sig-Geo] raster[] slow on large rasters

Kenny Bell kmb56 at berkeley.edu
Mon Oct 3 00:40:31 CEST 2016


Is an approach that could improve this is to arrange the locations to
collect into contiguous blocks inside raster:::.readCellsGDAL and read them
in block by block?

On Sun, Oct 2, 2016 at 3:32 PM, Kenny Bell <kmb56 at berkeley.edu> wrote:

> No substantial difference, no.
>
> cdl <- brick("Data/CDL/2015_30m_cdls/2015_30m_cdls.img")
> system.time(raster::sampleRandom(cdl, size = 100))
> #   user  system elapsed
> #   4.16   21.32   25.50
> system.time(cdl[random_pts$row_1D[1:100]])
> #   user  system elapsed
> #   1.33    5.36    6.69
>
> cdl <- raster("Data/CDL/2015_30m_cdls/2015_30m_cdls.img")
> system.time(raster::sampleRandom(cdl, size = 100))
> #   user  system elapsed
> #   4.07   21.34   25.46
> system.time(cdl[random_pts$row_1D[1:100]])
> #   user  system elapsed
> #   1.20    4.97    6.17
>
>
>
> On Sun, Oct 2, 2016 at 2:47 PM, Michael Sumner <mdsumner at gmail.com> wrote:
>
>> Try creating it as a single layer brick, does it make a difference?
>>
>> Cheers, Mike
>>
>> On Mon, 3 Oct 2016, 08:26 Kenny Bell <kmb56 at berkeley.edu> wrote:
>>
>>> I am trying to sample points from a large RasterLayer (~100GB if read
>>> into
>>> memory).
>>>
>>> raster::sampleRandom relies on raster raster:::.readCellsGDAL which seems
>>> to loop through rows, read in entire columns using rgdal::getRasterData,
>>> and subset those columns in R.
>>>
>>> Sampling 100000 pts from this raster is only a few per column, so this
>>> isn't efficient.
>>>
>>> Using my own random numbers with `[` also relies on
>>> raster:::.readCellsGDAL.
>>>
>>> Does anyone have a suggestion for a better practice?
>>>
>>> The raster is public so this code should be reproducible:
>>>
>>> download:
>>> ftp://ftp.nass.usda.gov/download/res/2015_30m_cdls.zip
>>>
>>> cdl <- raster("2015_30m_cdls/2015_30m_cdls.img")
>>> raster::sampleRandom(cdl, size = 100000) # slow
>>>
>>> Cheers,
>>> Kenny
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>> --
>> Dr. Michael Sumner
>> Software and Database Engineer
>> Australian Antarctic Division
>> 203 Channel Highway
>> Kingston Tasmania 7050 Australia
>>
>>
>
>
> --
> Kendon Bell
> Email: kmb56 at berkeley.edu
> Phone: (510) 612-3375
>
> Ph.D. Candidate
> Department of Agricultural & Resource Economics
> University of California, Berkeley
>



-- 
Kendon Bell
Email: kmb56 at berkeley.edu
Phone: (510) 612-3375

Ph.D. Candidate
Department of Agricultural & Resource Economics
University of California, Berkeley

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list