[R-sig-Geo] Optimized rasterOptions() for a (virtually) infinite RAM machine
Thiago V. dos Santos
thi_veloso at yahoo.com.br
Sat Sep 23 05:58:09 CEST 2017
I am using the raster package to process a total of 32 daily climate files supplied as netcdf files. Each file is a raster brick with 100 rows x 95 cols x 54750 time slices and weighs about 1.5 GB.
Essentially, all the processing I am performing on each netcdf file is:
a) to subset a specific date rangeb) to extract values using points
After that, I just convert the extracted data to data.tables and keep working in that format.
Since I extract data for about 450 points, and append all the data in a huge data.table, I need to use a computer with as much RAM as possible.
I ended up using a spot instance on Amazon EC2. Using an instance with 32 cores and 244GB of RAM will cost me around $0.30/hour.
Since I will be charged per hour, I need to optimize my code to get my results as fast as possible.
I don't even copy my data to the instance's hard disk; I send the files directly to the ram disk (/dev/shm). Even using 48GB of ram disk to store the files, I'll still have 196GB of RAM.
Under the scenario of having virtually infinite RAM memory, what would be the best rasterOptions() to make sure I am processing all my rasters in memory? Any other tips to benefit from such a large amount of RAM?
Thanks, -- Thiago V. dos Santos
Postdoctoral Research FellowDepartment of Climate and Space Science and EngineeringUniversity of Michigan
[[alternative HTML version deleted]]
More information about the R-sig-Geo