[R-sig-Geo] [raster newbie] regridding global latlong -> CONUS LCC with projectRaster

Tom Roche Tom_Roche at pobox.com
Fri Oct 5 01:39:26 CEST 2012


summary: I present a (hopefully) replicable self-contained example of
practical projectRaster use, which

1 inputs from a netCDF file (gridded global lon-lat) with a single
  data variable (datavar)

2 creates an input raster from the input netCDF

3 creates a template raster from a second netCDF (gridded LCC over
  CONUS)

4 uses the input and template rasters to write an output netCDF.

I note results inline, about which I have 4 questions (the content of
which may be due to my lack of geospatial experience):

1 The data in the input NetCDF datavar and the input raster should be
  similar, no? Yet their respective mins and maxs differ by ~3
  magnitudes. Why?

2 In creating a template raster using projectExtent(), I get a
  "projected point(s) not finite" warning: is it important?

3 Why do the missing and fill values in the input and output netCDF
  differ?

4 Why are there negative values in the output netCDF, while there were
  none in either the input netCDF or the input raster?

details:

Tom Roche Wed, 03 Oct 2012 11:34:58 -0400
>> I'm getting very wrong output

Robert J. Hijmans Wed, 3 Oct 2012 20:27:07 -0700
> I do not know why you say the output is wrong. Why you would expect
> that input variables would have to be "preserved"

You're right, I should have worded that better. I also note that

* since the original problem (hang, due to user error) is fixed, this
  should be in a new thread.

* I should have used an extents template from the beginning (though
  projectExtent may still have a problem with my IOAPI file).

So I'll now present a (hopefully) replicable example, with results
inline, and raise specific concerns. The first part of the example is a
bash scriptlet that makes a test folder and populates it with some
downloadable files; the second part runs in an R console.

# begin example-------------------------------------------------------

# test folder, made/destroyed
export TEST_DIR='/tmp/projectRasterTest'
rm -fr ${TEST_DIR}
mkdir -p ${TEST_DIR}

# Input data is GEIA marine N2O emissions converted to netCDF:
# see code @ https://github.com/TomRoche/GEIA_to_netCDF

DATA_INPUT_URI='https://github.com/downloads/TomRoche/GEIA_to_netCDF/GEIA_N2O_oceanic.nc'
DATA_INPUT_FN="$(basename ${DATA_INPUT_URI})"
export DATA_INPUT_FP="${TEST_DIR}/${DATA_INPUT_FN}"
wget --no-check-certificate -c -O ${DATA_INPUT_FP} ${DATA_INPUT_URI}

export DATA_VAR_NAME='emi_n2o'  # see ncdump output
export DATA_INPUT_BAND='3'      # GEIA makes dim=TIME third

# output name is variant of input name

DATA_INPUT_FN_ROOT="${DATA_INPUT_FN%.*}"
DATA_INPUT_FN_EXT="${DATA_INPUT_FN#*.}"
DATA_OUTPUT_FN="${DATA_INPUT_FN_ROOT}_regrid.${DATA_INPUT_FN_EXT}"
export DATA_OUTPUT_FP="${TEST_DIR}/${DATA_OUTPUT_FN}"

# The template input is a copy of some meteorological data with 52
# "real" datavars. That data is in the IOAPI format, which is in this
# case basically a wrapper around netCDF. (IOAPI can be used with other
# data, and similar wrappers exist for various models.) I removed all
# but one of the datavars (with NCO 'ncks'), and renamed (with NCO
# 'ncrename') that datavar to have the same name as my input datavar
# (and hopefully my output datavar).

TEMPLATE_INPUT_URI='https://github.com/downloads/TomRoche/GEIA_to_netCDF/emis_mole_all_20080101_12US1_cmaq_cb05_soa_2008ab_08c.EXTENTS_INPUT.nc'
TEMPLATE_INPUT_FN="$(basename ${TEMPLATE_INPUT_URI})"
export TEMPLATE_INPUT_FP="${TEST_DIR}/${TEMPLATE_INPUT_FN}"
wget --no-check-certificate -c -O ${TEMPLATE_INPUT_FP} ${TEMPLATE_INPUT_URI}

export TEMPLATE_INPUT_BAND='1' # `ncks` makes dim=TSTEP first

# This script generates some basic statistics for a netCDF datavar.

STAT_SCRIPT_URI='https://github.com/TomRoche/GEIA_to_netCDF/raw/master/netCDF.stats.to.stdout.r'
STAT_SCRIPT_FN="$(basename ${STAT_SCRIPT_URI})"
export STAT_SCRIPT_FP="${TEST_DIR}/${STAT_SCRIPT_FN}"
wget --no-check-certificate -c -O ${STAT_SCRIPT_FP} ${STAT_SCRIPT_URI}

ls -alh ${TEST_DIR}/
R

# following done in R console-----------------------------------------

library(raster)
library(maptools)
# rgeos not available on my cluster, dunno if that's important
gpclibPermit()
library(rgdal)

test.dir <- Sys.getenv('TEST_DIR')
in.fp <- Sys.getenv('DATA_INPUT_FP')
in.band <- Sys.getenv('DATA_INPUT_BAND')
data.var.name <- Sys.getenv('DATA_VAR_NAME')
template.in.fp <- Sys.getenv('TEMPLATE_INPUT_FP')
template.band <- Sys.getenv('TEMPLATE_INPUT_BAND')
out.crs <- '+proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +x=_0=-2556 +y_0=-1728'
out.fp <- Sys.getenv('DATA_OUTPUT_FP')
stat.script.fp <- Sys.getenv('STAT_SCRIPT_FP')
source(stat.script.fp)

system(sprintf('ncdump -h %s', in.fp))
# netcdf GEIA_N2O_oceanic {
# dimensions:
#         lon = 360 ;
#         lat = 180 ;
#         time = UNLIMITED ; // (1 currently)
# variables:
#         double lon(lon) ;
# ...
#         double lat(lat) ;
# ...
#         double time(time) ;
# ...
#         double emi_n2o(time, lat, lon) ;
#                 emi_n2o:units = "ton N2O-N/yr" ;
#                 emi_n2o:missing_value = -999. ;
#                 emi_n2o:long_name = "N2O emissions" ;
#                 emi_n2o:_FillValue = -999.f ;

# Note missing/fill values: more below.

netCDF.stats.to.stdout(netcdf.fp=in.fp, var.name=data.var.name)
# For /tmp/projectRasterTest/GEIA_N2O_oceanic.nc var=emi_n2o
#       cells=64800
#       obs=36143
#       min=5.96e-08
#       q1=30.4
#       med=67.7
#       mean=99.5
#       q3=140
#       max=1.17e+03

# Note min and max of the input, and that min > 0.

# make input raster
in.raster <- raster(in.fp, varname=data.var.name, band=in.band)
in.raster[] <- runif(ncell(in.raster))
in.raster
# class       : RasterLayer 
# dimensions  : 180, 360, 64800  (nrow, ncol, ncell)
# resolution  : 1, 1  (x, y)
# extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
# values      : in memory
# layer name  : N2O.emissions 
# min value   : 4.321872e-05 
# max value   : 0.9999898 
# z-value     : 0 

# Note min and max of the raster, and compare to
# min and max of the NetCDF (above) from which it's derived.

# My first question is, why is
# - min(raster) ~= 1000*min(netCDF)
# - max(netCDF) ~= 1000*min(raster)
# given that we have not yet regridded? Should not the values in the
# netCDF and the raster be the same at this point?

# make template raster
# truncate 'ncdump' to avoid IOAPI cruft
system(sprintf('ncdump -h %s | head -n 13', template.in.fp))
# netcdf emis_mole_all_20080101_12US1_cmaq_cb05_soa_2008ab_08c.EXTENTS_INPUT {
# dimensions:
#         TSTEP = UNLIMITED ; // (25 currently)
#         LAY = 1 ;
#         ROW = 299 ;
#         COL = 459 ;
# variables:
#         float emi_n2o(TSTEP, LAY, ROW, COL) ;
#                 emi_n2o:long_name = "XYL             " ;
#                 emi_n2o:units = "moles/s         " ;
#                 emi_n2o:var_desc = "Model species XYL                                                               " ;

# Hmm, template datavar lacks missing/fill values--
# might be cause of similar problems below.

template.in.raster <- raster(template.in.fp, varname=data.var.name, band=template.band)
template.raster <- projectExtent(template.in.raster, out.crs)
# Warning message:
# In projectExtent(template.in.raster, out.crs) :
#   158 projected point(s) not finite

# My second question is, is that "projected point(s) not finite" warning
# important? How should I interpret that? (I note 158/137241 is small.)

template.raster
# class       : RasterLayer 
# dimensions  : 299, 459, 137241  (nrow, ncol, ncell)
# resolution  : 53369.55, 56883.69  (x, y)
# extent      : -12246449, 12250173, -4532510, 12475715  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +y_0=-1728 +ellps=WGS84 
# values      : none

# That looks good! Much more like the domain specification @
# https://github.com/TomRoche/cornbeltN2O/wiki/AQMEII-North-American-domain#wiki-EPA
# My only criticism is that, to me it seems odd to both
# report dimensions in nrow,ncol (which is basically y,x ... no?), but
# report resolution in x,y
# Am I missing something? (rhetorical question :-)

out.raster <-
  projectRaster(
    from=in.raster, to=template.raster, method='bilinear',
    overwrite=TRUE, progress='window', format='CDF', filename=out.fp)
system(sprintf('ls -alht %s', test.dir))
# it wrote the output netCDF: good!

system(sprintf('ncdump -h %s', out.fp))
# netcdf GEIA_N2O_oceanic_regrid {
# dimensions:
#       northing = 459 ;
#       easting = 299 ;
# variables:
#       double northing(northing) ;
#               northing:units = "meter" ;
#       double easting(easting) ;
#               easting:units = "meter" ;

# While changing the names of the dimension and coordinate variables
# is disconcerting, I can live with that. More important are the
# following points, in increasing order of concern:

#       float variable(easting, northing) ;
#               variable:units =  ;
#               variable:missing_value = -3.4e+38f ;
#               variable:_FillValue = -3.4e+38f ;
#               variable:long_name = "variable" ;
#               variable:projection = "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=40 +lon_0=-97 +y_0=-1728 +ellps=WGS84" ;
#               variable:projection_format = "PROJ.4" ;
#               variable:min = -0.3366499f ;
#               variable:max = 1.313758f ;

# + the input datavar emi_n2o(time, lat, lon)
#   which ~=          emi_n2o(time, row, col), and
#   the output datavar variable(easting, northing)
#   which ~=           variable(row    , col     ) ;
#   this is good.

# - the output missing/fill values = -3.4e+38f while
#   the input  missing/fill values = -999.f ;
#   not a huge problem, but an annoyance.
#   So my third question is, why the change in missing/fill value?

netCDF.stats.to.stdout(netcdf.fp=out.fp, var.name='variable')
# For /tmp/projectRasterTest/GEIA_N2O_oceanic_regrid.nc var=variable
#       cells=137241
#       obs=137241
#       min=-0.337
#       q1=0.357
#       med=0.498
#       mean=0.499
#       q3=0.64
#       max=1.31

# My fourth question is my greatest concern: why are there negative
# values in the projectRaster output, while there were none in
# either the input netCDF (in.fp) or the input raster (in.raster)?

# end example---------------------------------------------------------

> 'raster' could also conceivably offer more functionality to specify
> the properties of the ncdf file you want (see the additional
> arguments for writing to a ncdf file in ?writeRaster), and
> suggestions are always welcome.

I will try to follow up on that in future. (I very much want an R
solution for regridding--fortran should not be necessary for data
assimilation!) But for now my main concern is this data; hence ...

Your assistance is appreciated, Tom Roche <Tom_Roche at pobox.com>



More information about the R-sig-Geo mailing list