[R-sig-Geo] [raster newbie] regridding global latlong -> CONUS LCC with projectRaster
Tom Roche
Tom_Roche at pobox.com
Fri Oct 5 01:39:26 CEST 2012
summary: I present a (hopefully) replicable self-contained example of
practical projectRaster use, which
1 inputs from a netCDF file (gridded global lon-lat) with a single
data variable (datavar)
2 creates an input raster from the input netCDF
3 creates a template raster from a second netCDF (gridded LCC over
CONUS)
4 uses the input and template rasters to write an output netCDF.
I note results inline, about which I have 4 questions (the content of
which may be due to my lack of geospatial experience):
1 The data in the input NetCDF datavar and the input raster should be
similar, no? Yet their respective mins and maxs differ by ~3
magnitudes. Why?
2 In creating a template raster using projectExtent(), I get a
"projected point(s) not finite" warning: is it important?
3 Why do the missing and fill values in the input and output netCDF
differ?
4 Why are there negative values in the output netCDF, while there were
none in either the input netCDF or the input raster?
details:
Tom Roche Wed, 03 Oct 2012 11:34:58 -0400
>> I'm getting very wrong output
Robert J. Hijmans Wed, 3 Oct 2012 20:27:07 -0700
> I do not know why you say the output is wrong. Why you would expect
> that input variables would have to be "preserved"
You're right, I should have worded that better. I also note that
* since the original problem (hang, due to user error) is fixed, this
should be in a new thread.
* I should have used an extents template from the beginning (though
projectExtent may still have a problem with my IOAPI file).
So I'll now present a (hopefully) replicable example, with results
inline, and raise specific concerns. The first part of the example is a
bash scriptlet that makes a test folder and populates it with some
downloadable files; the second part runs in an R console.
# begin example-------------------------------------------------------
# test folder, made/destroyed
export TEST_DIR='/tmp/projectRasterTest'
rm -fr ${TEST_DIR}
mkdir -p ${TEST_DIR}
# Input data is GEIA marine N2O emissions converted to netCDF:
# see code @ https://github.com/TomRoche/GEIA_to_netCDF
DATA_INPUT_URI='https://github.com/downloads/TomRoche/GEIA_to_netCDF/GEIA_N2O_oceanic.nc'
DATA_INPUT_FN="$(basename ${DATA_INPUT_URI})"
export DATA_INPUT_FP="${TEST_DIR}/${DATA_INPUT_FN}"
wget --no-check-certificate -c -O ${DATA_INPUT_FP} ${DATA_INPUT_URI}
export DATA_VAR_NAME='emi_n2o' # see ncdump output
export DATA_INPUT_BAND='3' # GEIA makes dim=TIME third
# output name is variant of input name
DATA_INPUT_FN_ROOT="${DATA_INPUT_FN%.*}"
DATA_INPUT_FN_EXT="${DATA_INPUT_FN#*.}"
DATA_OUTPUT_FN="${DATA_INPUT_FN_ROOT}_regrid.${DATA_INPUT_FN_EXT}"
export DATA_OUTPUT_FP="${TEST_DIR}/${DATA_OUTPUT_FN}"
# The template input is a copy of some meteorological data with 52
# "real" datavars. That data is in the IOAPI format, which is in this
# case basically a wrapper around netCDF. (IOAPI can be used with other
# data, and similar wrappers exist for various models.) I removed all
# but one of the datavars (with NCO 'ncks'), and renamed (with NCO
# 'ncrename') that datavar to have the same name as my input datavar
# (and hopefully my output datavar).
TEMPLATE_INPUT_URI='https://github.com/downloads/TomRoche/GEIA_to_netCDF/emis_mole_all_20080101_12US1_cmaq_cb05_soa_2008ab_08c.EXTENTS_INPUT.nc'
TEMPLATE_INPUT_FN="$(basename ${TEMPLATE_INPUT_URI})"
export TEMPLATE_INPUT_FP="${TEST_DIR}/${TEMPLATE_INPUT_FN}"
wget --no-check-certificate -c -O ${TEMPLATE_INPUT_FP} ${TEMPLATE_INPUT_URI}
export TEMPLATE_INPUT_BAND='1' # `ncks` makes dim=TSTEP first
# This script generates some basic statistics for a netCDF datavar.
STAT_SCRIPT_URI='https://github.com/TomRoche/GEIA_to_netCDF/raw/master/netCDF.stats.to.stdout.r'
STAT_SCRIPT_FN="$(basename ${STAT_SCRIPT_URI})"
export STAT_SCRIPT_FP="${TEST_DIR}/${STAT_SCRIPT_FN}"
wget --no-check-certificate -c -O ${STAT_SCRIPT_FP} ${STAT_SCRIPT_URI}
ls -alh ${TEST_DIR}/
R
# following done in R console-----------------------------------------
library(raster)
library(maptools)
# rgeos not available on my cluster, dunno if that's important
gpclibPermit()
library(rgdal)
test.dir <- Sys.getenv('TEST_DIR')
in.fp <- Sys.getenv('DATA_INPUT_FP')
in.band <- Sys.getenv('DATA_INPUT_BAND')
data.var.name <- Sys.getenv('DATA_VAR_NAME')
template.in.fp <- Sys.getenv('TEMPLATE_INPUT_FP')
template.band <- Sys.getenv('TEMPLATE_INPUT_BAND')
out.crs <- '+proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +x=_0=-2556 +y_0=-1728'
out.fp <- Sys.getenv('DATA_OUTPUT_FP')
stat.script.fp <- Sys.getenv('STAT_SCRIPT_FP')
source(stat.script.fp)
system(sprintf('ncdump -h %s', in.fp))
# netcdf GEIA_N2O_oceanic {
# dimensions:
# lon = 360 ;
# lat = 180 ;
# time = UNLIMITED ; // (1 currently)
# variables:
# double lon(lon) ;
# ...
# double lat(lat) ;
# ...
# double time(time) ;
# ...
# double emi_n2o(time, lat, lon) ;
# emi_n2o:units = "ton N2O-N/yr" ;
# emi_n2o:missing_value = -999. ;
# emi_n2o:long_name = "N2O emissions" ;
# emi_n2o:_FillValue = -999.f ;
# Note missing/fill values: more below.
netCDF.stats.to.stdout(netcdf.fp=in.fp, var.name=data.var.name)
# For /tmp/projectRasterTest/GEIA_N2O_oceanic.nc var=emi_n2o
# cells=64800
# obs=36143
# min=5.96e-08
# q1=30.4
# med=67.7
# mean=99.5
# q3=140
# max=1.17e+03
# Note min and max of the input, and that min > 0.
# make input raster
in.raster <- raster(in.fp, varname=data.var.name, band=in.band)
in.raster[] <- runif(ncell(in.raster))
in.raster
# class : RasterLayer
# dimensions : 180, 360, 64800 (nrow, ncol, ncell)
# resolution : 1, 1 (x, y)
# extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
# values : in memory
# layer name : N2O.emissions
# min value : 4.321872e-05
# max value : 0.9999898
# z-value : 0
# Note min and max of the raster, and compare to
# min and max of the NetCDF (above) from which it's derived.
# My first question is, why is
# - min(raster) ~= 1000*min(netCDF)
# - max(netCDF) ~= 1000*min(raster)
# given that we have not yet regridded? Should not the values in the
# netCDF and the raster be the same at this point?
# make template raster
# truncate 'ncdump' to avoid IOAPI cruft
system(sprintf('ncdump -h %s | head -n 13', template.in.fp))
# netcdf emis_mole_all_20080101_12US1_cmaq_cb05_soa_2008ab_08c.EXTENTS_INPUT {
# dimensions:
# TSTEP = UNLIMITED ; // (25 currently)
# LAY = 1 ;
# ROW = 299 ;
# COL = 459 ;
# variables:
# float emi_n2o(TSTEP, LAY, ROW, COL) ;
# emi_n2o:long_name = "XYL " ;
# emi_n2o:units = "moles/s " ;
# emi_n2o:var_desc = "Model species XYL " ;
# Hmm, template datavar lacks missing/fill values--
# might be cause of similar problems below.
template.in.raster <- raster(template.in.fp, varname=data.var.name, band=template.band)
template.raster <- projectExtent(template.in.raster, out.crs)
# Warning message:
# In projectExtent(template.in.raster, out.crs) :
# 158 projected point(s) not finite
# My second question is, is that "projected point(s) not finite" warning
# important? How should I interpret that? (I note 158/137241 is small.)
template.raster
# class : RasterLayer
# dimensions : 299, 459, 137241 (nrow, ncol, ncell)
# resolution : 53369.55, 56883.69 (x, y)
# extent : -12246449, 12250173, -4532510, 12475715 (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +y_0=-1728 +ellps=WGS84
# values : none
# That looks good! Much more like the domain specification @
# https://github.com/TomRoche/cornbeltN2O/wiki/AQMEII-North-American-domain#wiki-EPA
# My only criticism is that, to me it seems odd to both
# report dimensions in nrow,ncol (which is basically y,x ... no?), but
# report resolution in x,y
# Am I missing something? (rhetorical question :-)
out.raster <-
projectRaster(
from=in.raster, to=template.raster, method='bilinear',
overwrite=TRUE, progress='window', format='CDF', filename=out.fp)
system(sprintf('ls -alht %s', test.dir))
# it wrote the output netCDF: good!
system(sprintf('ncdump -h %s', out.fp))
# netcdf GEIA_N2O_oceanic_regrid {
# dimensions:
# northing = 459 ;
# easting = 299 ;
# variables:
# double northing(northing) ;
# northing:units = "meter" ;
# double easting(easting) ;
# easting:units = "meter" ;
# While changing the names of the dimension and coordinate variables
# is disconcerting, I can live with that. More important are the
# following points, in increasing order of concern:
# float variable(easting, northing) ;
# variable:units = ;
# variable:missing_value = -3.4e+38f ;
# variable:_FillValue = -3.4e+38f ;
# variable:long_name = "variable" ;
# variable:projection = "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +y_0=-1728 +ellps=WGS84" ;
# variable:projection_format = "PROJ.4" ;
# variable:min = -0.3366499f ;
# variable:max = 1.313758f ;
# + the input datavar emi_n2o(time, lat, lon)
# which ~= emi_n2o(time, row, col), and
# the output datavar variable(easting, northing)
# which ~= variable(row , col ) ;
# this is good.
# - the output missing/fill values = -3.4e+38f while
# the input missing/fill values = -999.f ;
# not a huge problem, but an annoyance.
# So my third question is, why the change in missing/fill value?
netCDF.stats.to.stdout(netcdf.fp=out.fp, var.name='variable')
# For /tmp/projectRasterTest/GEIA_N2O_oceanic_regrid.nc var=variable
# cells=137241
# obs=137241
# min=-0.337
# q1=0.357
# med=0.498
# mean=0.499
# q3=0.64
# max=1.31
# My fourth question is my greatest concern: why are there negative
# values in the projectRaster output, while there were none in
# either the input netCDF (in.fp) or the input raster (in.raster)?
# end example---------------------------------------------------------
> 'raster' could also conceivably offer more functionality to specify
> the properties of the ncdf file you want (see the additional
> arguments for writing to a ncdf file in ?writeRaster), and
> suggestions are always welcome.
I will try to follow up on that in future. (I very much want an R
solution for regridding--fortran should not be necessary for data
assimilation!) But for now my main concern is this data; hence ...
Your assistance is appreciated, Tom Roche <Tom_Roche at pobox.com>
More information about the R-sig-Geo
mailing list