[R-sig-Geo] polygonValues (raster): Very slow

Wed Jun 30 20:48:23 CEST 2010

> Except that with the vector I can discard the cells for which the
> original raster has no data.

To speed up your original set-up you could perhaps also first do
"trim(x)"  to remove the outer rows and columns of the raster that
only have NA values. Perhaps after doing crop first.
r <- crop(r, polygons)
r <- trim(r)

Robert

On Wed, Jun 30, 2010 at 11:35 AM, Agustin Lobo <alobolistas at gmail.com> wrote:
> Yes, right. Once not using weights, keep using the vector is not that
> much advantage.
> Except that with the vector I can discard the cells for which the
> original raster has no data.
> But this is not going to compensate. I'll try both and let you know.
>
> Thanks a lot for your help.
>
> Agus
>
> 2010/6/30 Robert J. Hijmans <r.hijmans at gmail.com>:
>> Agus, You should be able to use raster::resample to align the grids;
>> particularly if you are coarsening the data anyway. Treating a raster
>> as polygons is asking for trouble; even with big machines. Robert
>>
>>
>>
>> On Wed, Jun 30, 2010 at 11:10 AM, Agustin Lobo <alobolistas at gmail.com> wrote:
>>> Yes, you are both right. Actually, shame to me: the better the
>>> machine, the more careless the user!
>>>
>>> 1. Weights are not really needed, as the polygons are much larger than
>>> the pixels. Ignoring those pixels not
>>> having their center in the polygon is good enough.
>>> 2. A lot (~40%) of the polygons actually lie over the ocean or over
>>> continents for which the raster
>>> has no data. Therefore I must discard the unnecessary polygons first.
>>> I think I can do this with maptools,
>>> but can do it outside R as well. The only problem is that I would
>>> prefer not having broken polygons, so
>>> polygons should be either kept or eliminated.
>>>
>>> The polygons actually come from a grid. The raster is a map of %cover
>>> of Betula in Europe that we
>>> have to coarsen to an specified grid for a model of atmospheric
>>> transport that we expect will predict
>>> pollen abundance, which we'll check against data from pollen sampling stations.
>>>
>>> The grid is not aligned to the raster, this is why I'm using a polygon
>>> and a raster instead of 2 raster layers.
>>> But I can reconsider this if using 2 raster layer is faster.
>>>
>>> Thanks!
>>>
>>> Agus
>>>
>>> 2010/6/30 Nikhil Kaza <nikhil.list at gmail.com>:
>>>> I second that you should reconsider weights argument and zonal statistics
>>>> are much faster.
>>>>
>>>>
>>>> In case you wanted starspan download here it is
>>>>
>>>> http://projects.atlas.ca.gov/frs/?group_id=48
>>>>
>>>>
>>>> Nikhil Kaza
>>>> Asst. Professor,
>>>> City and Regional Planning
>>>> University of North Carolina
>>>>
>>>> nikhil.list at gmail.com
>>>>
>>>> On Jun 30, 2010, at 1:15 PM, Robert J. Hijmans wrote:
>>>>
>>>>> Dear Agus,
>>>>>
>>>>> You are extracting values for 18000 polygons for a high res raster.
>>>>> That is going to take a while. And using "weights=TRUE" is also bad
>>>>> (in terms of processing speed!); do you really need it?. You can do
>>>>> some testing by subsetting the polygons object.
>>>>>
>>>>> If the polygons are not overlapping, you could consider to do
>>>>> polygonsToRaster and then zonal. That would likely be much faster (but
>>>>> you would not have the weights).
>>>>>
>>>>> I have not attempted to optimize polygonValues much and 'raster' does
>>>>> not do multi-processor computations. I hope to have that implemented,
>>>>> at least for some slower functions like this one, by the end of this
>>>>> year.
>>>>>
>>>>> Robert
>>>>>
>>>>> On Wed, Jun 30, 2010 at 7:12 AM, Agustin Lobo <alobolistas at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi!
>>>>>> I'm trying:
>>>>>>
>>>>>>> eugrd025EFDC <- readOGR(dsn="eugrd025EFDC",layer="eugrd025EFDC")
>>>>>>
>>>>>> v <- polygonValues(p=eugrd025EFDC, Br, weights=TRUE)
>>>>>>
>>>>>> where
>>>>>>
>>>>>>> str(eugrd025EFDC,max.level=2)
>>>>>>
>>>>>> Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
>>>>>>  ..@ data       :'data.frame': 18000 obs. of  5 variables:
>>>>>>  ..@ polygons   :List of 18000
>>>>>>  .. .. [list output truncated]
>>>>>>  ..@ plotOrder  : int [1:18000] 17901 17900 17902 17903 17899 17898
>>>>>> 17904 17897 17905 17906 ...
>>>>>>  ..@ bbox       : num [1:2, 1:2] 2484331 1314148 6575852 4328780
>>>>>>  .. ..- attr(*, "dimnames")=List of 2
>>>>>>  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
>>>>>>
>>>>>>> summary(Br)
>>>>>>
>>>>>> Cells:  13967442
>>>>>> NAs  :  0
>>>>>>
>>>>>>
>>>>>> Min.       0.00
>>>>>> 1st Qu.    0.00
>>>>>> Median     0.00
>>>>>> Mean      48.82
>>>>>> 3rd Qu.    0.00
>>>>>> Max.    4999.00
>>>>>>
>>>>>> so quite large objects.
>>>>>>
>>>>>> The problem is that  polygonValues() has been running (and not
>>>>>> completed the task) for
>>>>>> more than 2 h on a intel core i7 machine with 16 Gb RAM (Dell
>>>>>> Precision M6500), so a pretty powerful machine.
>>>>>> Is there any way I could speed up this process?
>>>>>> Also, is there anything I could do in order to take better advantage
>>>>>> of the 8 processing threads?
>>>>>> Currently, I see only 1 cpu working for R processes and the rest
>>>>>> remain pretty inactive
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Agus
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>
>>
>