[R-sig-Geo] raster[] slow on large rasters

Kenny Bell kmb56 at berkeley.edu
Mon Oct 3 01:05:05 CEST 2016


I am unsure if the file is tiled - how do I find this out? I am finding
that sampling the cells directly and using `[` is also slow, though not as
slow as sampleRandom. Is that what you meant by "index cell extract"?

Using readAll isn't going to work as it reads in very slowly and is large
(~100GB).

On Sun, Oct 2, 2016 at 3:54 PM, Michael Sumner <mdsumner at gmail.com> wrote:

> Is the file tiled? Raster's extract is slow then because it scans line by
> line rather than by tile. The only fix I know  is to readAll into memory or
> write to a new untiled file. At any rate you might as well sample the cell
> numbers more directly and use index cell extract instead of sampleRandom
>
> On Mon, 3 Oct 2016, 09:40 Kenny Bell <kmb56 at berkeley.edu> wrote:
>
>> Is an approach that could improve this is to arrange the locations to
>> collect into contiguous blocks inside raster:::.readCellsGDAL and read them
>> in block by block?
>>
>> On Sun, Oct 2, 2016 at 3:32 PM, Kenny Bell <kmb56 at berkeley.edu> wrote:
>>
>>> No substantial difference, no.
>>>
>>> cdl <- brick("Data/CDL/2015_30m_cdls/2015_30m_cdls.img")
>>> system.time(raster::sampleRandom(cdl, size = 100))
>>> #   user  system elapsed
>>> #   4.16   21.32   25.50
>>> system.time(cdl[random_pts$row_1D[1:100]])
>>> #   user  system elapsed
>>> #   1.33    5.36    6.69
>>>
>>> cdl <- raster("Data/CDL/2015_30m_cdls/2015_30m_cdls.img")
>>> system.time(raster::sampleRandom(cdl, size = 100))
>>> #   user  system elapsed
>>> #   4.07   21.34   25.46
>>> system.time(cdl[random_pts$row_1D[1:100]])
>>> #   user  system elapsed
>>> #   1.20    4.97    6.17
>>>
>>>
>>>
>>> On Sun, Oct 2, 2016 at 2:47 PM, Michael Sumner <mdsumner at gmail.com>
>>> wrote:
>>>
>>>> Try creating it as a single layer brick, does it make a difference?
>>>>
>>>> Cheers, Mike
>>>>
>>>> On Mon, 3 Oct 2016, 08:26 Kenny Bell <kmb56 at berkeley.edu> wrote:
>>>>
>>>>> I am trying to sample points from a large RasterLayer (~100GB if read
>>>>> into
>>>>> memory).
>>>>>
>>>>> raster::sampleRandom relies on raster raster:::.readCellsGDAL which
>>>>> seems
>>>>> to loop through rows, read in entire columns using
>>>>> rgdal::getRasterData,
>>>>> and subset those columns in R.
>>>>>
>>>>> Sampling 100000 pts from this raster is only a few per column, so this
>>>>> isn't efficient.
>>>>>
>>>>> Using my own random numbers with `[` also relies on
>>>>> raster:::.readCellsGDAL.
>>>>>
>>>>> Does anyone have a suggestion for a better practice?
>>>>>
>>>>> The raster is public so this code should be reproducible:
>>>>>
>>>>> download:
>>>>> ftp://ftp.nass.usda.gov/download/res/2015_30m_cdls.zip
>>>>>
>>>>> cdl <- raster("2015_30m_cdls/2015_30m_cdls.img")
>>>>> raster::sampleRandom(cdl, size = 100000) # slow
>>>>>
>>>>> Cheers,
>>>>> Kenny
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>> --
>>>> Dr. Michael Sumner
>>>> Software and Database Engineer
>>>> Australian Antarctic Division
>>>> 203 Channel Highway
>>>> Kingston Tasmania 7050 Australia
>>>>
>>>>
>>>
>>>
>>> --
>>> Kendon Bell
>>> Email: kmb56 at berkeley.edu
>>> Phone: (510) 612-3375
>>>
>>> Ph.D. Candidate
>>> Department of Agricultural & Resource Economics
>>> University of California, Berkeley
>>>
>>
>>
>>
>> --
>> Kendon Bell
>> Email: kmb56 at berkeley.edu
>> Phone: (510) 612-3375
>>
>> Ph.D. Candidate
>> Department of Agricultural & Resource Economics
>> University of California, Berkeley
>>
> --
> Dr. Michael Sumner
> Software and Database Engineer
> Australian Antarctic Division
> 203 Channel Highway
> Kingston Tasmania 7050 Australia
>
>


-- 
Kendon Bell
Email: kmb56 at berkeley.edu
Phone: (510) 612-3375

Ph.D. Candidate
Department of Agricultural & Resource Economics
University of California, Berkeley

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list