[R-sig-Geo] readGDAL() and HDF5 files

Roger Bivand Roger.Bivand at nhh.no
Sat Jan 24 12:56:13 CET 2009


On Fri, 23 Jan 2009, Sebastian P. Luque wrote:

> On Sat, 24 Jan 2009 08:11:15 +1100,
> Michael Sumner <mdsumner at utas.edu.au> wrote:
>
>> Hi Sebastian, When you say the geographic coordinates are not a
>> regular grid - is it that the actual grid is Mercator but the NetCDF
>> file stores an X and Y vector separately for each unique longitude and
>> latitude? I've seen this many times, but never with enough metadata to
>> determine that without guessing. I've seen some documents that refer
>> to the source grid as being in Mercator, but never any that mention
>> explicitly the method used to generate the NetCDF file from those.
>
>> If this is the case for your data, I've had success by figuring out an
>> offset/scale value that work sufficiently by assuming a Mercator grid
>
>> Specifically this one but sometimes with an extra X offset to overcome
>> hemisphere shift.
>
>> CRSargs(CRS("+proj=merc"))
>
>> I don't have the details available today, but I can dig them up if
>> that sounds promising. Also, I'd be interested to hear any details you
>> have about the grids you are using, whether they use this
>> Mercator-kludge or not.
>
> Thanks for the feedback.
>
> You're right in that the geographic coordinates are not a regular grid,
> but separate data subsets in the NetCDF file, with the actual grid being
> in a polar stereographic projection (the one in my first post).  The
> actual coordinates of the grid are not available in any file (NetCDF nor
> HDF5), and neither these nor the geographical coordinates in the HDF5
> files, but that doesn't matter in my case because I need to subset a
> smaller area, grid at an appropriate resolution using the geographical
> coordinates, and reproject to a different projection for my area.  After
> all, it's simple enough to access the geographical coordinates from any
> of the NetCDF files to map the grid onto such a coordinate system.
>
> Therefore, for working with these grids in R, I'd be happy if I could
> just load the grid correctly using the arbitrary grid coordinates that
> readGDAL() builds from the HDF5 file.  I haven't managed to read the
> NetCDF files into R (R segfaults), so things look better with HDF5 for
> working in R with these data.

Please do provide full version data for R, rgdal, GDAL, and the GDAL 
drivers, including the HDF5 libraries. I did - please do the same: all 
tests on Windows, OSGeo4W GDAL 1.5.4 with released drivers:

http://download.osgeo.org/osgeo4w/osgeo4w-setup.exe

and unreleased Windows binary rgdal_0.6-6 from:

http://spatial.nhh.no/R/Devel/rgdal_0.6-6.zip

Then at least we are comparing like with like. Nothing should segfault 
under any circumstances (though the GDAL HDF5 driver segfaults for me when 
asking for non-existent data[01]). A fuller report on the NetCDF case 
would be helpful.

My feeling at the moment is that the file is not self-documenting itself 
in a portable way, because GDAL does read the data in an order that isn't 
what the file header claims.

> The example I showed (and that Roger
> reproduced) builds a grid with the wrong dimensions.

The dimensions are what the file declares - I suspect that software within 
the originator institution knows that they are reversed, and does the 
same, but this isn't portable. I further suspect that the 1120x1 block 
size is not helpful, and probably should be 760x1. See below for analysis.

>  This is what the
> header of the whole file (this is a new file, with the same structure as
> the one I showed) says (using HDF5 tools,
> http://hdf.ncsa.uiuc.edu/HDF5):
>
> ---<--------------------cut here---------------start------------------->---
> $ h5dump -H ice_conc_nh_200901011200.hdf
> HDF5 "ice_conc_nh_200901011200.hdf" {
> GROUP "/" {
>   GROUP "Data" {
>      DATASET "data[00]" {
>         DATATYPE  H5T_IEEE_F32LE
>         DATASPACE  SIMPLE { ( 760, 1120 ) / ( 760, 1120 ) }
>         ATTRIBUTE "Description" {
>            DATATYPE  H5T_STRING {
>                  STRSIZE 16;
>                  STRPAD H5T_STR_NULLTERM;
>                  CSET H5T_CSET_ASCII;
>                  CTYPE H5T_C_S1;
>               }
>            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>         }
>      }
>   }
>   DATASET "Header" {
>      DATATYPE  H5T_COMPOUND {
>         H5T_STRING {
>            STRSIZE 32;
>            STRPAD H5T_STR_NULLTERM;
>            CSET H5T_CSET_ASCII;
>            CTYPE H5T_C_S1;
>         } "source";
>         H5T_STRING {
>            STRSIZE 16;
>            STRPAD H5T_STR_NULLTERM;
>            CSET H5T_CSET_ASCII;
>            CTYPE H5T_C_S1;
>         } "product";
>         H5T_STRING {
>            STRSIZE 16;
>            STRPAD H5T_STR_NULLTERM;
>            CSET H5T_CSET_ASCII;
>            CTYPE H5T_C_S1;
>         } "area";
>         H5T_STRING {
>            STRSIZE 128;
>            STRPAD H5T_STR_NULLTERM;
>            CSET H5T_CSET_ASCII;
>            CTYPE H5T_C_S1;
>         } "projstr";
>         H5T_STD_U32LE "iw";
>         H5T_STD_U32LE "ih";
>         H5T_STD_U16LE "z";
>         H5T_IEEE_F32LE "Ax";
>         H5T_IEEE_F32LE "Ay";
>         H5T_IEEE_F32LE "Bx";
>         H5T_IEEE_F32LE "By";
>         H5T_STD_U32LE "year";
>         H5T_STD_U16LE "month";
>         H5T_STD_U16LE "day";
>         H5T_STD_U16LE "hour";
>         H5T_STD_U16LE "minute";
>      }
>      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>   }
> }
> }
> ---<--------------------cut here---------------end--------------------->---
>
> and the grid should have 1120 rows and 760 columns, as displayed by
> gdalinfo on the actual data:

No, see below, you are reversing axes in gdalinfo:

>
> ---<--------------------cut here---------------start------------------->---
> $ gdalinfo HDF5:"ice_conc_nh_200901011200.hdf"://Data/data[00]
> Driver: HDF5Image/HDF5 Dataset
> Files: none associated
> Size is 1120, 760
> Coordinate System is:
> GEOGCS["WGS 84",
>    DATUM["WGS_1984",
>        SPHEROID["WGS 84",6378137,298.257223563,
>            AUTHORITY["EPSG","7030"]],
>        TOWGS84[0,0,0,0,0,0,0],
>        AUTHORITY["EPSG","6326"]],
>    PRIMEM["Greenwich",0,
>        AUTHORITY["EPSG","8901"]],
>    UNIT["degree",0.0174532925199433,
>        AUTHORITY["EPSG","9108"]],
>    AXIS["Lat",NORTH],
>    AXIS["Long",EAST],
>    AUTHORITY["EPSG","4326"]]
> Corner Coordinates:
> Upper Left  (    0.0,    0.0)
> Lower Left  (    0.0,  760.0)
> Upper Right ( 1120.0,    0.0)
> Lower Right ( 1120.0,  760.0)
> Center      (  560.0,  380.0)
> Band 1 Block=1120x1 Type=Float32, ColorInterp=Undefined
>  Metadata:
>    data[00]:Description=Ice Conc
> ---<--------------------cut here---------------end--------------------->---
>
> but readGDAL() apparently read the dimensions in the opposite order:

No, exactly as the GDAL driver does, 760 rows and 1120 columns, left is 0, 
right is 1120, upper is 0, lower is 760.

upper left (0,0)                        upper right (1120, 0)

lower left (0,760)                      lower right (1120, 760)

So either the GDAL driver is broken, or the data in the file is not in 
sync with its metadata.

So far I get your figure by molesting the input metadata:

ice <- readGDAL("HDF5:\"conc_200901011200.hdf\"://Data/data[00]",
  p4s=as.character(NA))
is.na(ice$band1) <- ice$band1 < 0
vice <- ice$band1
mice <- matrix(vice, ncol=760, byrow=TRUE)
image(t(mice[1120:1,]), asp=1)

As I said before, the input HDF5 file has a completely wrong description 
of its own projection, so my conclusion would be that it is so badly 
configured as not to be usable in a portable way, since both its geometry 
and projection are declared in error.

Hope this helps,

Roger


>
> ---<--------------------cut here---------------start------------------->---
> R> summary(ice)
> Object of class SpatialGridDataFrame
> Coordinates:
>  min  max
> x   0 1120
> y   0  760
> Is projected: NA
> proj4string : [NA]
> Number of points: 2
> Grid attributes:
>  cellcentre.offset cellsize cells.dim
> x               0.5        1      1120
> y               0.5        1       760
> Data attributes:
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> -32800     -99     -99    -117       0     100
> ---<--------------------cut here---------------end--------------------->---
>
> Does this make sense? and could this be the problem?  The data that are
> read into the `ice' object do look ok via summary(ice), but the way
> they're mapped onto the grid does not.
>
>
> Cheers,
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list