[R-sig-Geo] readGDAL() and HDF5 files

Sebastian P. Luque spluque at gmail.com
Sat Jan 24 20:21:02 CET 2009


On Sat, 24 Jan 2009 12:56:13 +0100 (CET),
Roger Bivand <Roger.Bivand at nhh.no> wrote:

> On Fri, 23 Jan 2009, Sebastian P. Luque wrote:
>> On Sat, 24 Jan 2009 08:11:15 +1100,
>> Michael Sumner <mdsumner at utas.edu.au> wrote:

>> Hi Sebastian, When you say the geographic coordinates are not a
>>> regular grid - is it that the actual grid is Mercator but the NetCDF
>>> file stores an X and Y vector separately for each unique longitude
>>> and latitude? I've seen this many times, but never with enough
>>> metadata to determine that without guessing. I've seen some
>>> documents that refer to the source grid as being in Mercator, but
>>> never any that mention explicitly the method used to generate the
>>> NetCDF file from those.

>>> If this is the case for your data, I've had success by figuring out
>>> an offset/scale value that work sufficiently by assuming a Mercator
>>> grid

>>> Specifically this one but sometimes with an extra X offset to
>>> overcome hemisphere shift.

>>> CRSargs(CRS("+proj=merc"))

>>> I don't have the details available today, but I can dig them up if
>>> that sounds promising. Also, I'd be interested to hear any details
>>> you have about the grids you are using, whether they use this
>>> Mercator-kludge or not.

>> Thanks for the feedback.

>> You're right in that the geographic coordinates are not a regular
>> grid, but separate data subsets in the NetCDF file, with the actual
>> grid being in a polar stereographic projection (the one in my first
>> post).  The actual coordinates of the grid are not available in any
>> file (NetCDF nor HDF5), and neither these nor the geographical
>> coordinates in the HDF5 files, but that doesn't matter in my case
>> because I need to subset a smaller area, grid at an appropriate
>> resolution using the geographical coordinates, and reproject to a
>> different projection for my area.  After all, it's simple enough to
>> access the geographical coordinates from any of the NetCDF files to
>> map the grid onto such a coordinate system.

>> Therefore, for working with these grids in R, I'd be happy if I could
>> just load the grid correctly using the arbitrary grid coordinates
>> that readGDAL() builds from the HDF5 file.  I haven't managed to read
>> the NetCDF files into R (R segfaults), so things look better with
>> HDF5 for working in R with these data.

> Please do provide full version data for R, rgdal, GDAL, and the GDAL
> drivers, including the HDF5 libraries. I did - please do the same: all
> tests on Windows, OSGeo4W GDAL 1.5.4 with released drivers:

I did provide most of this information in my first post, although I
forgot to include info on non-R software, so here it is again:

R> sessionInfo()
R version 2.8.1 (2008-12-22) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rgdal_0.6-5     sp_0.9-29       slmisc_0.7.0    lattice_0.17-20

loaded via a namespace (and not attached):
[1] grid_2.8.1


and Debian sid libgdal1-1.5.2-3, which lags behind slightly the upstream
version.  I'm also using Debian sid's libhdf5-serial-1.6.6-0 for the
HDF5 libraries.  OSGeo4W is only for MS Windows IIUC.  So we may not be
able to properly compare our results.

For what it's worth, this is what I tried with one of the NetCDF files
(using Debian sid's libnetcdf4 1:3.6.2-3.1, where ncdump-hdf is
equivalent to the upstream ncdump facility):

---<--------------------cut here---------------start------------------->---
$ ncdump-hdf -h ice_conc_nh_200901151200.nc 
netcdf ice_conc_nh_200901151200 {
dimensions:
	time = 1 ;
	ni = 760 ;
	nj = 1120 ;

variables:
	long time(time) ;
		time:long_name = "reference time of sea ice file" ;
		time:units = "seconds since 1981-01-01 00:00:00" ;
	float lat(nj, ni) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
	float lon(nj, ni) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
	char polar_stereographic ;
		polar_stereographic:grid_mapping_name = "polar_stereographic" ;
		polar_stereographic:straight_vertical_longitude_from_pole = -45.f ;
		polar_stereographic:latitude_of_projection_origin = 90.f ;
		polar_stereographic:standard_parallel = 70.f ;
		polar_stereographic:false_easting = -3850.f ;
		polar_stereographic:false_northing = 5850.f ;
		polar_stereographic:proj4_string = "+proj=stere +a=6378273 +b=6356889.44891 +lat_0=90 +lat_ts=70 +lon_0=-45" ;
	short ice_concentration(time, nj, ni) ;
		ice_concentration:long_name = "sea ice concentration" ;
		ice_concentration:standard_name = "ice_concentration" ;
		ice_concentration:units = "%" ;
		ice_concentration:coordinates = "lon lat" ;
		ice_concentration:grid_mapping = "polar_stereographic" ;
		ice_concentration:source = "EUMETSAT OSI SAF" ;
		ice_concentration:missing_value = -32767s ;
		ice_concentration:_FillValue = -32767s ;
		ice_concentration:valid_min = 0.f ;
		ice_concentration:valid_max = 100.f ;
		ice_concentration:scale_factor = 0.0099999998f ;
		ice_concentration:add_offset = 0.f ;
	byte confidence_level(time, nj, ni) ;
		confidence_level:long_name = "confidence level" ;
		confidence_level:units = "1" ;
		confidence_level:coordinates = "lon lat" ;
		confidence_level:grid_mapping = "polar_stereographic" ;
		confidence_level:missing_value = '\0' ;
		confidence_level:_FillValue = '\0' ;
		confidence_level:valid_min = '\1' ;
		confidence_level:valid_max = '\5' ;
		confidence_level:comment = "0 Unprocessed; 1 Erroneous; 2 Unreliable; 3 Acceptable; 4 Good; 5 Excellent" ;
	short quality_index(time, nj, ni) ;
		quality_index:long_name = "quality index" ;
		quality_index:units = "1" ;
		quality_index:coordinates = "lon lat" ;
		quality_index:grid_mapping = "polar_stereographic" ;
		quality_index:missing_value = 0s ;
		quality_index:_FillValue = 0s ;
		quality_index:comment = "Contains bitmap with quality flags, see Users Manual for details" ;

// global attributes:
		:title = "Total Sea Ice Concentration from EUMETSAT OSI SAF" ;
		:conventions = "CF-1.0" ;
		:netcdf_version_id = "3.5.1 of Feb  7 2008 11:53:33 $" ;
		:creation_date = "2009-01-16" ;
		:product_version = "2.1" ;
		:software_version = "3.2" ;
		:reference = "OSI SAF Sea Ice Product Manual, Andersen S., Breivik L.A., Eastwood S., Godøy Ø., Lind M., Porcires M., Schyberg H., v3.5, January 2007" ;
		:comment = "For more information about the OSI SAF Sea Ice products, see http://saf.met.no" ;
		:sensor = "Multi sensor" ;
		:spatial_resolution = "10.0km" ;
		:area = "Northern Hemipshere" ;
		:southernmost_latitude = 31.08831f ;
		:northernmost_latitude = 89.934723f ;
		:westernmost_longitude = -180.f ;
		:easternmost_longitude = 179.92558f ;
		:start_date = "2009-01-15 UTC" ;
		:start_time = "00:00:00 UTC" ;
		:stop_date = "2009-01-15 UTC" ;
		:stop_time = "23:59:59 UTC" ;
		:field_type = "daily" ;
		:institution = "EUMETSAT OSI SAF" ;
		:institution_references = "http://www.osi-saf.org" ;
		:contact = "osisaf-manager at met.no" ;
		:operational_status = "operational" ;
}
$ gdalinfo ice_conc_nh_200901151200.nc 
Segmentation fault
---<--------------------cut here---------------end--------------------->---

As can be seen, these files have more datasets.  Presumably the segfault
I'm getting in R stems from the same problem with gdal above:

---<--------------------cut here---------------start------------------->---
R> ncdata <- "ice_conc_nh_200901151200.nc"
R> p4s <- NA
R> ice <- readGDAL(ncdata, p4s=p4s)

 *** caught segfault ***
address 0x1050, cause 'memory not mapped'

Traceback:
 1: .Call("RGDAL_OpenDataset", as.character(filename), TRUE, PACKAGE = "rgdal")
 2: .local(.Object, ...)
 3: initialize(value, ...)
 4: initialize(value, ...)
 5: new("GDALReadOnlyDataset", filename)
 6: GDAL.open(fname)
 7: readGDAL(ncdata, p4s = p4s)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
---<--------------------cut here---------------end--------------------->---

although note that I don't know how to tell readGDAL to access the
dataset that contains the actual grid I want (ice_concentration) for
this file format.

Anyway, based on your analysis below, it does seem as if the file header
is not providing metadata that GDAL can actually understand.  Otherwise
gdalinfo should not segfault like that.  I also get the same messed up
image I get in R if I use gdaltranslate to see a TIFF image of the file.

Thanks for looking into this.  I can live using your data "molesting"
approach for now! :-)


Cheers,
Seb



> http://download.osgeo.org/osgeo4w/osgeo4w-setup.exe

> and unreleased Windows binary rgdal_0.6-6 from:

> http://spatial.nhh.no/R/Devel/rgdal_0.6-6.zip

> Then at least we are comparing like with like. Nothing should segfault
> under any circumstances (though the GDAL HDF5 driver segfaults for me
> when asking for non-existent data[01]). A fuller report on the NetCDF
> case would be helpful.

> My feeling at the moment is that the file is not self-documenting
> itself in a portable way, because GDAL does read the data in an order
> that isn't what the file header claims.

>> The example I showed (and that Roger reproduced) builds a grid with
>> the wrong dimensions.

> The dimensions are what the file declares - I suspect that software
> within the originator institution knows that they are reversed, and
> does the same, but this isn't portable. I further suspect that the
> 1120x1 block size is not helpful, and probably should be 760x1. See
> below for analysis.

>> This is what the header of the whole file (this is a new file, with
>> the same structure as the one I showed) says (using HDF5 tools,
>> http://hdf.ncsa.uiuc.edu/HDF5):

>> ---<--------------------cut
>> here---------------start------------------->--- $ h5dump -H
>> ice_conc_nh_200901011200.hdf HDF5 "ice_conc_nh_200901011200.hdf" {
>> GROUP "/" { GROUP "Data" { DATASET "data[00]" { DATATYPE
>> H5T_IEEE_F32LE DATASPACE SIMPLE { ( 760, 1120 ) / ( 760, 1120 ) }
>> ATTRIBUTE "Description" { DATATYPE H5T_STRING { STRSIZE 16; STRPAD
>> H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1;
>> }
>> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
>> }
>> }
>> }
>> DATASET "Header" { DATATYPE H5T_COMPOUND { H5T_STRING { STRSIZE 32;
>> STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }
>> "source"; H5T_STRING { STRSIZE 16; STRPAD H5T_STR_NULLTERM; CSET
>> H5T_CSET_ASCII; CTYPE H5T_C_S1; } "product"; H5T_STRING { STRSIZE 16;
>> STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }
>> "area"; H5T_STRING { STRSIZE 128; STRPAD H5T_STR_NULLTERM; CSET
>> H5T_CSET_ASCII; CTYPE H5T_C_S1; } "projstr"; H5T_STD_U32LE "iw";
>> H5T_STD_U32LE "ih"; H5T_STD_U16LE "z"; H5T_IEEE_F32LE "Ax";
>> H5T_IEEE_F32LE "Ay"; H5T_IEEE_F32LE "Bx"; H5T_IEEE_F32LE "By";
>> H5T_STD_U32LE "year"; H5T_STD_U16LE "month"; H5T_STD_U16LE "day";
>> H5T_STD_U16LE "hour"; H5T_STD_U16LE "minute";
>> }
>> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
>> }
>> }
>> }
>> ---<--------------------cut
>> here---------------end--------------------->---

>> and the grid should have 1120 rows and 760 columns, as displayed by
>> gdalinfo on the actual data:

> No, see below, you are reversing axes in gdalinfo:


>> ---<--------------------cut
>> here---------------start------------------->--- $ gdalinfo
>> HDF5:"ice_conc_nh_200901011200.hdf"://Data/data[00] Driver:
>> HDF5Image/HDF5 Dataset Files: none associated Size is 1120, 760
>> Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984",
>> SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]],
>> TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6326"]],
>> PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
>> UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9108"]],
>> AXIS["Lat",NORTH], AXIS["Long",EAST], AUTHORITY["EPSG","4326"]]
>> Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 760.0)
>> Upper Right ( 1120.0, 0.0) Lower Right ( 1120.0, 760.0) Center (
>> 560.0, 380.0) Band 1 Block=1120x1 Type=Float32, ColorInterp=Undefined
>> Metadata: data[00]:Description=Ice Conc ---<--------------------cut
>> here---------------end--------------------->---

>> but readGDAL() apparently read the dimensions in the opposite order:

> No, exactly as the GDAL driver does, 760 rows and 1120 columns, left
> is 0, right is 1120, upper is 0, lower is 760.

> upper left (0,0) upper right (1120, 0)

> lower left (0,760) lower right (1120, 760)

> So either the GDAL driver is broken, or the data in the file is not in
> sync with its metadata.

> So far I get your figure by molesting the input metadata:

> ice <- readGDAL("HDF5:\"conc_200901011200.hdf\"://Data/data[00]",
> p4s=as.character(NA)) is.na(ice$band1) <- ice$band1 < 0 vice <-
> ice$band1 mice <- matrix(vice, ncol=760, byrow=TRUE)
> image(t(mice[1120:1,]), asp=1)

> As I said before, the input HDF5 file has a completely wrong
> description of its own projection, so my conclusion would be that it
> is so badly configured as not to be usable in a portable way, since
> both its geometry and projection are declared in error.

> Hope this helps,

> Roger



>> ---<--------------------cut
>> here---------------start------------------->---
R> summary(ice)
>> Object of class SpatialGridDataFrame Coordinates: min max x 0 1120 y
>> 0 760 Is projected: NA proj4string : [NA] Number of points: 2 Grid
>> attributes: cellcentre.offset cellsize cells.dim x 0.5 1 1120 y 0.5 1
>> 760 Data attributes: Min. 1st Qu.  Median Mean 3rd Qu.  Max.  -32800
>> -99 -99 -117 0 100 ---<--------------------cut
>> here---------------end--------------------->---

>> Does this make sense? and could this be the problem?  The data that
>> are read into the `ice' object do look ok via summary(ice), but the
>> way they're mapped onto the grid does not.


>> Cheers,



> -- Roger Bivand Economic Geography Section, Department of Economics,
> Norwegian School of Economics and Business Administration, Helleveien
> 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no



Cheers,

-- 
Seb




More information about the R-sig-Geo mailing list