[R-sig-Geo] Faster way to extract raster values using multiple polygons?

Thiago V. dos Santos thi_veloso at yahoo.com.br
Wed Apr 6 14:00:44 CEST 2016


Dear all,

I am trying to extract a time series from a raster brick using multiple polygons.

The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.

As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.

My initial, VERY SLOW attempt was something like this:


library(raster)

# Create some sample raster data
idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
r <- raster(ncol=360, nrow=180)
b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
b <- setZ(b, idx)

# Import polygon data
br <- getData("GADM", country="Brazil", level=2) # about 7MB to download

# Subset the SMALLEST state - only 75 municipalities
br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
plot(br_sub)

# Now let's extract the data for each municipality
beginCluster()
e <- extract(b, br_sub, fun=mean, df=T)
endCluster()


Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.

I have the feeling that my code is not very optimized.

Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
Thanks in advance,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science

University of Minnesota



More information about the R-sig-Geo mailing list