[R-sig-Geo] polygonValues (raster): Very slow

Robert J. Hijmans r.hijmans at gmail.com
Wed Jun 30 20:20:32 CEST 2010


Agus, You should be able to use raster::resample to align the grids;
particularly if you are coarsening the data anyway. Treating a raster
as polygons is asking for trouble; even with big machines. Robert



On Wed, Jun 30, 2010 at 11:10 AM, Agustin Lobo <alobolistas at gmail.com> wrote:
> Yes, you are both right. Actually, shame to me: the better the
> machine, the more careless the user!
>
> 1. Weights are not really needed, as the polygons are much larger than
> the pixels. Ignoring those pixels not
> having their center in the polygon is good enough.
> 2. A lot (~40%) of the polygons actually lie over the ocean or over
> continents for which the raster
> has no data. Therefore I must discard the unnecessary polygons first.
> I think I can do this with maptools,
> but can do it outside R as well. The only problem is that I would
> prefer not having broken polygons, so
> polygons should be either kept or eliminated.
>
> The polygons actually come from a grid. The raster is a map of %cover
> of Betula in Europe that we
> have to coarsen to an specified grid for a model of atmospheric
> transport that we expect will predict
> pollen abundance, which we'll check against data from pollen sampling stations.
>
> The grid is not aligned to the raster, this is why I'm using a polygon
> and a raster instead of 2 raster layers.
> But I can reconsider this if using 2 raster layer is faster.
>
> Thanks!
>
> Agus
>
> 2010/6/30 Nikhil Kaza <nikhil.list at gmail.com>:
>> I second that you should reconsider weights argument and zonal statistics
>> are much faster.
>>
>>
>> In case you wanted starspan download here it is
>>
>> http://projects.atlas.ca.gov/frs/?group_id=48
>>
>>
>> Nikhil Kaza
>> Asst. Professor,
>> City and Regional Planning
>> University of North Carolina
>>
>> nikhil.list at gmail.com
>>
>> On Jun 30, 2010, at 1:15 PM, Robert J. Hijmans wrote:
>>
>>> Dear Agus,
>>>
>>> You are extracting values for 18000 polygons for a high res raster.
>>> That is going to take a while. And using "weights=TRUE" is also bad
>>> (in terms of processing speed!); do you really need it?. You can do
>>> some testing by subsetting the polygons object.
>>>
>>> If the polygons are not overlapping, you could consider to do
>>> polygonsToRaster and then zonal. That would likely be much faster (but
>>> you would not have the weights).
>>>
>>> I have not attempted to optimize polygonValues much and 'raster' does
>>> not do multi-processor computations. I hope to have that implemented,
>>> at least for some slower functions like this one, by the end of this
>>> year.
>>>
>>> Robert
>>>
>>> On Wed, Jun 30, 2010 at 7:12 AM, Agustin Lobo <alobolistas at gmail.com>
>>> wrote:
>>>>
>>>> Hi!
>>>> I'm trying:
>>>>
>>>>> eugrd025EFDC <- readOGR(dsn="eugrd025EFDC",layer="eugrd025EFDC")
>>>>
>>>> v <- polygonValues(p=eugrd025EFDC, Br, weights=TRUE)
>>>>
>>>> where
>>>>
>>>>> str(eugrd025EFDC,max.level=2)
>>>>
>>>> Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
>>>>  ..@ data       :'data.frame': 18000 obs. of  5 variables:
>>>>  ..@ polygons   :List of 18000
>>>>  .. .. [list output truncated]
>>>>  ..@ plotOrder  : int [1:18000] 17901 17900 17902 17903 17899 17898
>>>> 17904 17897 17905 17906 ...
>>>>  ..@ bbox       : num [1:2, 1:2] 2484331 1314148 6575852 4328780
>>>>  .. ..- attr(*, "dimnames")=List of 2
>>>>  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
>>>>
>>>>> summary(Br)
>>>>
>>>> Cells:  13967442
>>>> NAs  :  0
>>>>
>>>>
>>>> Min.       0.00
>>>> 1st Qu.    0.00
>>>> Median     0.00
>>>> Mean      48.82
>>>> 3rd Qu.    0.00
>>>> Max.    4999.00
>>>>
>>>> so quite large objects.
>>>>
>>>> The problem is that  polygonValues() has been running (and not
>>>> completed the task) for
>>>> more than 2 h on a intel core i7 machine with 16 Gb RAM (Dell
>>>> Precision M6500), so a pretty powerful machine.
>>>> Is there any way I could speed up this process?
>>>> Also, is there anything I could do in order to take better advantage
>>>> of the 8 processing threads?
>>>> Currently, I see only 1 cpu working for R processes and the rest
>>>> remain pretty inactive
>>>>
>>>> Thanks
>>>>
>>>> Agus
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>



More information about the R-sig-Geo mailing list