[R-sig-Geo] Help on slow processing of extract fuction

Roger Bivand Roger.Bivand at nhh.no
Mon Feb 21 10:50:34 CET 2011

On Mon, 21 Feb 2011, Rahul Raj wrote:

> On Mon, Feb 21, 2011 at 3:18 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> On Mon, 21 Feb 2011, Rahul Raj wrote:
>>  Dear All,
>>> I am using the "extract" command to extract the fraction of each class in
>>> the raster cells, falling in each grid of shape file of polygon type.
>>>> extract(r,v) ### where 'v' is my polygon grid of 1 km resolution and 'r'
>>> is my classified raster image.
>>> My polygon grid has more than 2 million polygons and I am running the
>>> command in 64 bit window xp operating system having quard 4 processor and
>>> 8
>>> GB RAM. But the extract function is taking a long time (more than 2 days)
>>> to
>>> generate results.
>>> please help me how can I speed up the processing time. Is there any other
>>> packages in R that can provide the fast processing for this type of
>>> extraction.
>>> Also, how can I read one polygon at a time in R from the polygon shape
>>> file
>>> stored in disk.
>> Please, do try to think carefully before beginning work. What are you
>> actually doing? Are your polygon objects really irregular multipolygons, or
>> are they rectangles? Do they overlap? If I recall, you are looping over the
>> polygons - did you time a single polygon and say 100 polygons taken
>> together? You are trying to use R/raster/sp for tasks that are very
>> different from those they were designed for, in a very demanding way (loops
>> are known to be inferior to vectorisation in R).
>> So you run extract() on your whole raster for each polygon, which
>> internally crops your raster to the polygon, rasterises it, turns it into
>> points, and extracts the values at those rasterised points from the raster.
>> In large scale applications like yours, almost certainly one should use
>> different data and process models.
>> Doing this 2 million times is not efficient, as you are experiencing. I
>> don't know why you thought it would be. How to make it more efficient will
>> depend on your data structures, and possibly on coding in R, or R and C.
>> extract() will work perfectly well on your raster with say 500 polygons, but
>> does not seem to have been written for very many polygons.
>> Roger
>>> Thanks in advance for the valuable suggestions.
>>> With regards
>>> Rahul Raj
>>> Indian Institute of Remote Sensing, india.
>>>        [[alternative HTML version deleted]]
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>> Dear Roger
> I have a classified image of whole India (India measures 3,214 km (1,997 mi)
> from north to south and 2,993 km (1,860 mi) from east to west) at 56m
> resolution. I have generate a square grid  of type polygon (area of each
> square polygon in grid is 1sq km) that covers whole India. Now I have to
> extract the fraction of crop classes from classified image for each square
> polygon in the grid.

My immediate impression is that you are going to have to loop anyway, but 
that you can avoid all the cropping by using the lower-level functions in 
rgdal. The function GDAL.open() opens a handle to a dataset - you must 
close the dataset with GDAL.close() when done. Then use the offset=c(yrow, 
xrow) in the getRasterTable() function to step over the dataset in 
appropriate chunks. Train first using a part of the map you know well to 
be sure what is going on. Then just loop over your offset, and aggregate 
on the values returned by getRasterTable(), which are the cell coordinates 
and the land cover value of interest.

This is using the fact that your polygons are rectangles, so can be 
emulated by setting for example region.dim=c(20, 20), and iterating over 
offset. You can tag your output data with the centroid of the 400 cells 
retrieved each time. This gives you access to all the land cover classes 
in each 1km square. Because 20 x 56m is 1120m, you may want to choose say 
18 by 18 for region.dim=, which is 1008 by 1008.

You might also decimate your input grid with output.dim= to let you train 
on a smaller data set that is easier to visualise.

For the global landcover dataset:

> library(rgdal)
> dataset <- GDAL.open("GLOBCOVER_L4_200901_200912_V2.3.tif")
> dim(dataset)
[1]  55800 129600
> displayDataset(dataset, band = 1, output.dim=c(180, 360))

shows me the whole dataset, and:

> displayDataset(dataset, band = 1, offset=c(1000, 2000), 
+ region.dim=c(20000, 40000), output.dim=c(180, 360))

northern North America. This data set is otherwise too large to handle, 
but getRasterTable() and getRasterData() - which returns just the data as 
a matrix - are pretty fast.

Remember to do GDAL.close(dataset) when done.

Hope this helps,


PS. library(fortunes); fortune("Yoda")

> For this purpose I thought to use the extract command. As the computer
> memory can not handle the such a large grid, therefore I divided my grid
> according to the state of India (India has 28 states) .I have used the clip
> operation of ARC GIS to clip the state wise grid (from my large grid) using
> vector boundary of state. At the boundary of state grid (still have 0.1 to 2
> million polygons) , I am losing the square shape of polygon.
> I have used rgdal package to import this subset grid (state wise). I am
> running the 'extract' command in loop so that the extraction of class
> fraction (from classified map) runs on one polygon at a time in each loop (I
> did it to avoid the out of memory problem of computer).
> I knew by reading your mail that my approach is not appropriate.
> I am also getting problems like " Error in gc(): caught access violation -
> continue with care", when the looping on  extract function starts.
> I am a beginner in R programming and I have joined R sig geo just before few
> days. Please help me to short out my problem in handling large grid in R.
> Please let me know if still you have doubt in understanding my data.
> With regards
> Rahul Raj
> Indian Institute of Remote Sensing, India.

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

More information about the R-sig-Geo mailing list