[R-sig-Geo] polygonValues (raster): Very slow

Agustin Lobo alobolistas at gmail.com
Wed Jun 30 20:35:24 CEST 2010


Yes, right. Once not using weights, keep using the vector is not that
much advantage.
Except that with the vector I can discard the cells for which the
original raster has no data.
But this is not going to compensate. I'll try both and let you know.

Thanks a lot for your help.

Agus

2010/6/30 Robert J. Hijmans <r.hijmans at gmail.com>:
> Agus, You should be able to use raster::resample to align the grids;
> particularly if you are coarsening the data anyway. Treating a raster
> as polygons is asking for trouble; even with big machines. Robert
>
>
>
> On Wed, Jun 30, 2010 at 11:10 AM, Agustin Lobo <alobolistas at gmail.com> wrote:
>> Yes, you are both right. Actually, shame to me: the better the
>> machine, the more careless the user!
>>
>> 1. Weights are not really needed, as the polygons are much larger than
>> the pixels. Ignoring those pixels not
>> having their center in the polygon is good enough.
>> 2. A lot (~40%) of the polygons actually lie over the ocean or over
>> continents for which the raster
>> has no data. Therefore I must discard the unnecessary polygons first.
>> I think I can do this with maptools,
>> but can do it outside R as well. The only problem is that I would
>> prefer not having broken polygons, so
>> polygons should be either kept or eliminated.
>>
>> The polygons actually come from a grid. The raster is a map of %cover
>> of Betula in Europe that we
>> have to coarsen to an specified grid for a model of atmospheric
>> transport that we expect will predict
>> pollen abundance, which we'll check against data from pollen sampling stations.
>>
>> The grid is not aligned to the raster, this is why I'm using a polygon
>> and a raster instead of 2 raster layers.
>> But I can reconsider this if using 2 raster layer is faster.
>>
>> Thanks!
>>
>> Agus
>>
>> 2010/6/30 Nikhil Kaza <nikhil.list at gmail.com>:
>>> I second that you should reconsider weights argument and zonal statistics
>>> are much faster.
>>>
>>>
>>> In case you wanted starspan download here it is
>>>
>>> http://projects.atlas.ca.gov/frs/?group_id=48
>>>
>>>
>>> Nikhil Kaza
>>> Asst. Professor,
>>> City and Regional Planning
>>> University of North Carolina
>>>
>>> nikhil.list at gmail.com
>>>
>>> On Jun 30, 2010, at 1:15 PM, Robert J. Hijmans wrote:
>>>
>>>> Dear Agus,
>>>>
>>>> You are extracting values for 18000 polygons for a high res raster.
>>>> That is going to take a while. And using "weights=TRUE" is also bad
>>>> (in terms of processing speed!); do you really need it?. You can do
>>>> some testing by subsetting the polygons object.
>>>>
>>>> If the polygons are not overlapping, you could consider to do
>>>> polygonsToRaster and then zonal. That would likely be much faster (but
>>>> you would not have the weights).
>>>>
>>>> I have not attempted to optimize polygonValues much and 'raster' does
>>>> not do multi-processor computations. I hope to have that implemented,
>>>> at least for some slower functions like this one, by the end of this
>>>> year.
>>>>
>>>> Robert
>>>>
>>>> On Wed, Jun 30, 2010 at 7:12 AM, Agustin Lobo <alobolistas at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi!
>>>>> I'm trying:
>>>>>
>>>>>> eugrd025EFDC <- readOGR(dsn="eugrd025EFDC",layer="eugrd025EFDC")
>>>>>
>>>>> v <- polygonValues(p=eugrd025EFDC, Br, weights=TRUE)
>>>>>
>>>>> where
>>>>>
>>>>>> str(eugrd025EFDC,max.level=2)
>>>>>
>>>>> Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
>>>>>  ..@ data       :'data.frame': 18000 obs. of  5 variables:
>>>>>  ..@ polygons   :List of 18000
>>>>>  .. .. [list output truncated]
>>>>>  ..@ plotOrder  : int [1:18000] 17901 17900 17902 17903 17899 17898
>>>>> 17904 17897 17905 17906 ...
>>>>>  ..@ bbox       : num [1:2, 1:2] 2484331 1314148 6575852 4328780
>>>>>  .. ..- attr(*, "dimnames")=List of 2
>>>>>  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
>>>>>
>>>>>> summary(Br)
>>>>>
>>>>> Cells:  13967442
>>>>> NAs  :  0
>>>>>
>>>>>
>>>>> Min.       0.00
>>>>> 1st Qu.    0.00
>>>>> Median     0.00
>>>>> Mean      48.82
>>>>> 3rd Qu.    0.00
>>>>> Max.    4999.00
>>>>>
>>>>> so quite large objects.
>>>>>
>>>>> The problem is that  polygonValues() has been running (and not
>>>>> completed the task) for
>>>>> more than 2 h on a intel core i7 machine with 16 Gb RAM (Dell
>>>>> Precision M6500), so a pretty powerful machine.
>>>>> Is there any way I could speed up this process?
>>>>> Also, is there anything I could do in order to take better advantage
>>>>> of the 8 processing threads?
>>>>> Currently, I see only 1 cpu working for R processes and the rest
>>>>> remain pretty inactive
>>>>>
>>>>> Thanks
>>>>>
>>>>> Agus
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>
>



More information about the R-sig-Geo mailing list