[R-sig-Geo] Updated repository of worldmaps (300MB of data at 5km resolution)

Tomislav Hengl hengl at spatial-analyst.net
Thu Apr 1 15:54:24 CEST 2010


Dear R-sig-geo,

FYI: I have just updated the small repository of publicly available data sets of interest for global modeling/mapping that I have launched about a year ago. This now contains 62 layers at resolution of 0.05 arcdegrees and with a complete world coverage (it use to be 65S-65N only). The data is available for download at [http://spatial-analyst.net/worldmaps/]. Each layer comes with a raster description file *.rdc, which typically has the same name as the attached layer (description of the fields is available in the [http://spatial-analyst.net/worldmaps/README.txt]). The raster description file includes also a link to an R script that (should) show all processing steps from download to export of maps (I advise you to run the scripts step by step because the data sets are usually Huge). If you want to read more about what is all available on this repository (and outside), please read the complete article [http://spatial-analyst.net/wiki/index.php?title=Global_datasets]. You can also browse a gallery of worldmaps from here:

http://commons.wikimedia.org/wiki/Publicly_available_global_data_sets# 

Note that some maps have limited geographical coverage (e.g. PCEVI, GLC2000), which usually means that the data for polar regions is missing.

In about 2-3 weeks, I will tidy up the small errors and finalize the maps and metadata. If you think that I have maybe missed some important (publicly available) layers, please let me know. For example, I have tried to include also the map of airline flight paths from [http://commons.wikimedia.org/wiki/File:World_airline_routes.png], but could not determine coordinate system, lineage etc. I am sure that there is much more what is available (on and off the web), but I would at least try to be representative.

My next project will be to prepare the 1 km data (about 70% of maps listed are available also in this resolution) and put them into some database format such as WKT raster or rasdaman. This way anybody will be able to overlay point, line, polygon features and fetch only the results of queries from the server. But it looks as this will take more time than I have initially anticipated.


ARE THESE MAPS JUST COPIES?
Many of the layers listed (cca 20-30%) are simply resampled and reformatted maps that are already available from the original providers (e.g. pcclim, GLC2000, himpact etc.). The great majority of maps are basically original layers that you will not be able to find elsewhere. For example, PCEVI1 is the 1st Principal Component of the total time-series of monthly MODIS EVI bands (this image basically shows the average long-term 'biomass' in the world). If you wish to cite some of the maps I have prepared, then you should refer to the chapter #4 in my book [http://spatial-analyst.net/book/DataSources], otherwise I advise you to refer to the original data providers. 
Each *.rdc file contains information about the data source, including the link to where you can find the source data and peer-reviewed publication that describes the dataset.
Personally, I find it frustrating that there are several global mapping projects in the world that overlap (for example, there are at least four global land cover maps!). In some cases I solve the problem by simply taking the average (e.g. globedem is a an average between two maps), but the categorical maps cannot be average as easy. My second frustration are the license and copyright problems. Some data producers (usually the USA mapping agencies) have a very clear policy and even support copying and distribution of the maps they produce (provided that the source is acknowledged); others (e.g. himpacts) are not really clear. I am only interested in processing and organizing the publicly available data.


MEMORY LIMIT PROBLEMS
Going from 10 km to 5 km resolution brought me to many technical headaches. Just to download the input data takes about one week (the input data I used to generate the 62 layers, now of size 300MB, is about 500GB!). Each layer now has cca 26M pixels, which will obviously lead to many memory and computational limit problems. For example, I doubt it that you will be able to load these data into R on a standard PC (2-4GB RAM) or visualize the maps using spplot. I also tried to derive some DEM parameters such as SAGA TWI, height above channel channels etc, but the maps are simply too big (processing takes >24 hours), so it is very likely that you will also face memory limit and computational problems on your computers. PS: I used a Dell 2.8GHz with dual processor and 64Bit Window XP OS with 4GB of RAM to run processing, and this configuration was already on the edge.

I am really thankful to Frank Warmerdam and colleagues for providing excellent utilities [http://www.gdal.org/gdal_utilities.html] which I heavily used to prepare the maps. I actually did run a small comparison between the gdalwarp utility and Erdas Imagine and ArcGIS and discovered that gdal utilities are (1) faster and (2) more easy to script (+ you have a support for proj4 strings and largest family of GIS data formats). The second on my list was SAGA GIS, which can also crunch Huge data (up to 2GB) and has a large library of GIS operations. I highly recommend these two programs and would support further development very much. In some cases, I could not find any functionality for the analysis in gdal utilities and SAGA, so I used ILWIS GIS (e.g. to run principle component analysis and to extract density of lines and point features). Unfortunately linking of R and ILWIS is not as smooth, so I often finished running part of the analysis in ILWIS separately. This is just an important information for the people that will focus only on using the R scripts.

I am looking forward to your feedback and further comments. 

A copy of this mail in html format (you can insert comments below) is also available here:

http://spatial-analyst.net/book/updated_worldmaps 

yours,

T. Hengl
http://home.medewerker.uva.nl/t.hengl/



More information about the R-sig-Geo mailing list