[R-sig-Geo] Faster way to extract raster values using multiple polygons?

Sat Apr 9 05:56:14 CEST 2016

Hi Ben,

Thank you so much for the suggestion. 

It is pretty fast, and I guess it's a good assumption for temperature. I'll have to do the same with precipitation, which is way more spatially variable than temperature. In that case, I think I'll have to wait the 10 hours it takes to run "extract" using a mean value of all cells covering a polygon.

It is a one-time processing anyway, so I should not worry too much about the waiting time.

Greetings,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science
University of Minnesota

On Wednesday, April 6, 2016 10:48 AM, Ben Tupper <btupper at bigelow.org> wrote:
Hi,

The resolution of your raster data (1 degree) is much more coarse than what your polygons represent.  Could you short-circuit the process by assuming that the temp at the centroid of each polygon would suitably represent the mean temperature across each polygon?  Unless you have some much bigger polygons, I can't imagine it will be very far off. If so, then you could pretty quickly extract the values for each layer in the raster at each centroid.  Perhaps like this?

cents <- coordinates(br_sub)
v <- extract(b, cents)

Is that close enough?

Cheers,
Ben

> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo <r-sig-geo at r-project.org> wrote:
> 
> Dear all,
> 
> I am trying to extract a time series from a raster brick using multiple polygons.
> 
> The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.
> 
> As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.
> 
> My initial, VERY SLOW attempt was something like this:
> 
> 
> library(raster)
> 
> # Create some sample raster data
> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
> r <- raster(ncol=360, nrow=180)
> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
> b <- setZ(b, idx)
> 
> # Import polygon data
> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
> 
> # Subset the SMALLEST state - only 75 municipalities
> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
> plot(br_sub)
> 
> # Now let's extract the data for each municipality
> beginCluster()
> e <- extract(b, br_sub, fun=mean, df=T)
> endCluster()
> 
> 
> Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.
> 
> I have the feeling that my code is not very optimized.
> 
> Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
> Thanks in advance,
> -- Thiago V. dos Santos
> 
> PhD student
> Land and Atmospheric Science
> 
> University of Minnesota
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org