[R-sig-Geo] readGDAL() and HDF5 files

Roger Bivand Roger.Bivand at nhh.no
Sun Jan 25 18:38:52 CET 2009


On Sat, 24 Jan 2009, Sebastian P. Luque wrote:

Hi,

If you have the cell centre offset for the lower left cell, then:

is.na(ice$band1) <- ice$band1 < 0
vice <- ice$band1
mice <- matrix(vice, ncol=760, byrow=TRUE)
mice1 <- t(mice[1120:1,])
grd <- GridTopology(c(-4941217, -4941217), c(10000,10000), c(760, 1120))
SG <- SpatialGrid(grd, proj4string=CRS("+init=epsg:3411"))
im <- list(x=sort(coordinatevalues(getGridTopology(SG))$s1),
  y=sort(coordinatevalues(getGridTopology(SG))$s2), z=mice1)
image(im, asp=1)
SGDF <- image2Grid(im, p4="+init=epsg:3411")
image(SGDF, axes=TRUE)


gets you there, but here is crucially dependent on knowing the offset 
(here taken as -90E, 31.08831N (value from netCDF description)). This is 
wrong, as:

SPixDF <- as(SGDF, "SpatialPixelsDataFrame")
SPDF <- as(SPixDF, "SpatialPointsDataFrame")
SPDF_ll <- spTransform(SPDF, CRS("+proj=longlat"))

then:

library(maptools)
xx <- GE_SpatialGrid(SPDF_ll)
png("ice.png", width=xx$width, height=xx$height, bg="transparent")
par(mar=c(0,0,0,0), xaxs="i", yaxs="i")
plot(SPDF_ll, cex=0.3, col="orange", xlim=xx$xlim, ylim=xx$ylim,
   setParUsrBB=TRUE)
dev.off()
kmlOverlay(xx, "ice.kml", "ice.png")

shows - it puts Greenland well West and a little North of its position 
when viewed in GE.

Getting there ...

Roger


> On Sat, 24 Jan 2009 12:56:13 +0100 (CET),
> Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Fri, 23 Jan 2009, Sebastian P. Luque wrote:
>>> On Sat, 24 Jan 2009 08:11:15 +1100,
>>> Michael Sumner <mdsumner at utas.edu.au> wrote:
>
>>> Hi Sebastian, When you say the geographic coordinates are not a
>>>> regular grid - is it that the actual grid is Mercator but the NetCDF
>>>> file stores an X and Y vector separately for each unique longitude
>>>> and latitude? I've seen this many times, but never with enough
>>>> metadata to determine that without guessing. I've seen some
>>>> documents that refer to the source grid as being in Mercator, but
>>>> never any that mention explicitly the method used to generate the
>>>> NetCDF file from those.
>
>>>> If this is the case for your data, I've had success by figuring out
>>>> an offset/scale value that work sufficiently by assuming a Mercator
>>>> grid
>
>>>> Specifically this one but sometimes with an extra X offset to
>>>> overcome hemisphere shift.
>
>>>> CRSargs(CRS("+proj=merc"))
>
>>>> I don't have the details available today, but I can dig them up if
>>>> that sounds promising. Also, I'd be interested to hear any details
>>>> you have about the grids you are using, whether they use this
>>>> Mercator-kludge or not.
>
>>> Thanks for the feedback.
>
>>> You're right in that the geographic coordinates are not a regular
>>> grid, but separate data subsets in the NetCDF file, with the actual
>>> grid being in a polar stereographic projection (the one in my first
>>> post).  The actual coordinates of the grid are not available in any
>>> file (NetCDF nor HDF5), and neither these nor the geographical
>>> coordinates in the HDF5 files, but that doesn't matter in my case
>>> because I need to subset a smaller area, grid at an appropriate
>>> resolution using the geographical coordinates, and reproject to a
>>> different projection for my area.  After all, it's simple enough to
>>> access the geographical coordinates from any of the NetCDF files to
>>> map the grid onto such a coordinate system.
>
>>> Therefore, for working with these grids in R, I'd be happy if I could
>>> just load the grid correctly using the arbitrary grid coordinates
>>> that readGDAL() builds from the HDF5 file.  I haven't managed to read
>>> the NetCDF files into R (R segfaults), so things look better with
>>> HDF5 for working in R with these data.
>
>> Please do provide full version data for R, rgdal, GDAL, and the GDAL
>> drivers, including the HDF5 libraries. I did - please do the same: all
>> tests on Windows, OSGeo4W GDAL 1.5.4 with released drivers:
>
> I did provide most of this information in my first post, although I
> forgot to include info on non-R software, so here it is again:
>
> R> sessionInfo()
> R version 2.8.1 (2008-12-22)
> x86_64-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rgdal_0.6-5     sp_0.9-29       slmisc_0.7.0    lattice_0.17-20
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.1
>
>
> and Debian sid libgdal1-1.5.2-3, which lags behind slightly the upstream
> version.  I'm also using Debian sid's libhdf5-serial-1.6.6-0 for the
> HDF5 libraries.  OSGeo4W is only for MS Windows IIUC.  So we may not be
> able to properly compare our results.
>
> For what it's worth, this is what I tried with one of the NetCDF files
> (using Debian sid's libnetcdf4 1:3.6.2-3.1, where ncdump-hdf is
> equivalent to the upstream ncdump facility):
>
> ---<--------------------cut here---------------start------------------->---
> $ ncdump-hdf -h ice_conc_nh_200901151200.nc
> netcdf ice_conc_nh_200901151200 {
> dimensions:
> 	time = 1 ;
> 	ni = 760 ;
> 	nj = 1120 ;
>
> variables:
> 	long time(time) ;
> 		time:long_name = "reference time of sea ice file" ;
> 		time:units = "seconds since 1981-01-01 00:00:00" ;
> 	float lat(nj, ni) ;
> 		lat:long_name = "latitude" ;
> 		lat:units = "degrees_north" ;
> 	float lon(nj, ni) ;
> 		lon:long_name = "longitude" ;
> 		lon:units = "degrees_east" ;
> 	char polar_stereographic ;
> 		polar_stereographic:grid_mapping_name = "polar_stereographic" ;
> 		polar_stereographic:straight_vertical_longitude_from_pole = -45.f ;
> 		polar_stereographic:latitude_of_projection_origin = 90.f ;
> 		polar_stereographic:standard_parallel = 70.f ;
> 		polar_stereographic:false_easting = -3850.f ;
> 		polar_stereographic:false_northing = 5850.f ;
> 		polar_stereographic:proj4_string = "+proj=stere +a=6378273 +b=6356889.44891 +lat_0=90 +lat_ts=70 +lon_0=-45" ;
> 	short ice_concentration(time, nj, ni) ;
> 		ice_concentration:long_name = "sea ice concentration" ;
> 		ice_concentration:standard_name = "ice_concentration" ;
> 		ice_concentration:units = "%" ;
> 		ice_concentration:coordinates = "lon lat" ;
> 		ice_concentration:grid_mapping = "polar_stereographic" ;
> 		ice_concentration:source = "EUMETSAT OSI SAF" ;
> 		ice_concentration:missing_value = -32767s ;
> 		ice_concentration:_FillValue = -32767s ;
> 		ice_concentration:valid_min = 0.f ;
> 		ice_concentration:valid_max = 100.f ;
> 		ice_concentration:scale_factor = 0.0099999998f ;
> 		ice_concentration:add_offset = 0.f ;
> 	byte confidence_level(time, nj, ni) ;
> 		confidence_level:long_name = "confidence level" ;
> 		confidence_level:units = "1" ;
> 		confidence_level:coordinates = "lon lat" ;
> 		confidence_level:grid_mapping = "polar_stereographic" ;
> 		confidence_level:missing_value = '\0' ;
> 		confidence_level:_FillValue = '\0' ;
> 		confidence_level:valid_min = '\1' ;
> 		confidence_level:valid_max = '\5' ;
> 		confidence_level:comment = "0 Unprocessed; 1 Erroneous; 2 Unreliable; 3 Acceptable; 4 Good; 5 Excellent" ;
> 	short quality_index(time, nj, ni) ;
> 		quality_index:long_name = "quality index" ;
> 		quality_index:units = "1" ;
> 		quality_index:coordinates = "lon lat" ;
> 		quality_index:grid_mapping = "polar_stereographic" ;
> 		quality_index:missing_value = 0s ;
> 		quality_index:_FillValue = 0s ;
> 		quality_index:comment = "Contains bitmap with quality flags, see Users Manual for details" ;
>
> // global attributes:
> 		:title = "Total Sea Ice Concentration from EUMETSAT OSI SAF" ;
> 		:conventions = "CF-1.0" ;
> 		:netcdf_version_id = "3.5.1 of Feb  7 2008 11:53:33 $" ;
> 		:creation_date = "2009-01-16" ;
> 		:product_version = "2.1" ;
> 		:software_version = "3.2" ;
> 		:reference = "OSI SAF Sea Ice Product Manual, Andersen S., Breivik L.A., Eastwood S., God?y ?., Lind M., Porcires M., Schyberg H., v3.5, January 2007" ;
> 		:comment = "For more information about the OSI SAF Sea Ice products, see http://saf.met.no" ;
> 		:sensor = "Multi sensor" ;
> 		:spatial_resolution = "10.0km" ;
> 		:area = "Northern Hemipshere" ;
> 		:southernmost_latitude = 31.08831f ;
> 		:northernmost_latitude = 89.934723f ;
> 		:westernmost_longitude = -180.f ;
> 		:easternmost_longitude = 179.92558f ;
> 		:start_date = "2009-01-15 UTC" ;
> 		:start_time = "00:00:00 UTC" ;
> 		:stop_date = "2009-01-15 UTC" ;
> 		:stop_time = "23:59:59 UTC" ;
> 		:field_type = "daily" ;
> 		:institution = "EUMETSAT OSI SAF" ;
> 		:institution_references = "http://www.osi-saf.org" ;
> 		:contact = "osisaf-manager at met.no" ;
> 		:operational_status = "operational" ;
> }
> $ gdalinfo ice_conc_nh_200901151200.nc
> Segmentation fault
> ---<--------------------cut here---------------end--------------------->---
>
> As can be seen, these files have more datasets.  Presumably the segfault
> I'm getting in R stems from the same problem with gdal above:
>
> ---<--------------------cut here---------------start------------------->---
> R> ncdata <- "ice_conc_nh_200901151200.nc"
> R> p4s <- NA
> R> ice <- readGDAL(ncdata, p4s=p4s)
>
> *** caught segfault ***
> address 0x1050, cause 'memory not mapped'
>
> Traceback:
> 1: .Call("RGDAL_OpenDataset", as.character(filename), TRUE, PACKAGE = "rgdal")
> 2: .local(.Object, ...)
> 3: initialize(value, ...)
> 4: initialize(value, ...)
> 5: new("GDALReadOnlyDataset", filename)
> 6: GDAL.open(fname)
> 7: readGDAL(ncdata, p4s = p4s)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> ---<--------------------cut here---------------end--------------------->---
>
> although note that I don't know how to tell readGDAL to access the
> dataset that contains the actual grid I want (ice_concentration) for
> this file format.
>
> Anyway, based on your analysis below, it does seem as if the file header
> is not providing metadata that GDAL can actually understand.  Otherwise
> gdalinfo should not segfault like that.  I also get the same messed up
> image I get in R if I use gdaltranslate to see a TIFF image of the file.
>
> Thanks for looking into this.  I can live using your data "molesting"
> approach for now! :-)
>
>
> Cheers,
> Seb
>
>
>
>> http://download.osgeo.org/osgeo4w/osgeo4w-setup.exe
>
>> and unreleased Windows binary rgdal_0.6-6 from:
>
>> http://spatial.nhh.no/R/Devel/rgdal_0.6-6.zip
>
>> Then at least we are comparing like with like. Nothing should segfault
>> under any circumstances (though the GDAL HDF5 driver segfaults for me
>> when asking for non-existent data[01]). A fuller report on the NetCDF
>> case would be helpful.
>
>> My feeling at the moment is that the file is not self-documenting
>> itself in a portable way, because GDAL does read the data in an order
>> that isn't what the file header claims.
>
>>> The example I showed (and that Roger reproduced) builds a grid with
>>> the wrong dimensions.
>
>> The dimensions are what the file declares - I suspect that software
>> within the originator institution knows that they are reversed, and
>> does the same, but this isn't portable. I further suspect that the
>> 1120x1 block size is not helpful, and probably should be 760x1. See
>> below for analysis.
>
>>> This is what the header of the whole file (this is a new file, with
>>> the same structure as the one I showed) says (using HDF5 tools,
>>> http://hdf.ncsa.uiuc.edu/HDF5):
>
>>> ---<--------------------cut
>>> here---------------start------------------->--- $ h5dump -H
>>> ice_conc_nh_200901011200.hdf HDF5 "ice_conc_nh_200901011200.hdf" {
>>> GROUP "/" { GROUP "Data" { DATASET "data[00]" { DATATYPE
>>> H5T_IEEE_F32LE DATASPACE SIMPLE { ( 760, 1120 ) / ( 760, 1120 ) }
>>> ATTRIBUTE "Description" { DATATYPE H5T_STRING { STRSIZE 16; STRPAD
>>> H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1;
>>> }
>>> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
>>> }
>>> }
>>> }
>>> DATASET "Header" { DATATYPE H5T_COMPOUND { H5T_STRING { STRSIZE 32;
>>> STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }
>>> "source"; H5T_STRING { STRSIZE 16; STRPAD H5T_STR_NULLTERM; CSET
>>> H5T_CSET_ASCII; CTYPE H5T_C_S1; } "product"; H5T_STRING { STRSIZE 16;
>>> STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }
>>> "area"; H5T_STRING { STRSIZE 128; STRPAD H5T_STR_NULLTERM; CSET
>>> H5T_CSET_ASCII; CTYPE H5T_C_S1; } "projstr"; H5T_STD_U32LE "iw";
>>> H5T_STD_U32LE "ih"; H5T_STD_U16LE "z"; H5T_IEEE_F32LE "Ax";
>>> H5T_IEEE_F32LE "Ay"; H5T_IEEE_F32LE "Bx"; H5T_IEEE_F32LE "By";
>>> H5T_STD_U32LE "year"; H5T_STD_U16LE "month"; H5T_STD_U16LE "day";
>>> H5T_STD_U16LE "hour"; H5T_STD_U16LE "minute";
>>> }
>>> DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
>>> }
>>> }
>>> }
>>> ---<--------------------cut
>>> here---------------end--------------------->---
>
>>> and the grid should have 1120 rows and 760 columns, as displayed by
>>> gdalinfo on the actual data:
>
>> No, see below, you are reversing axes in gdalinfo:
>
>
>>> ---<--------------------cut
>>> here---------------start------------------->--- $ gdalinfo
>>> HDF5:"ice_conc_nh_200901011200.hdf"://Data/data[00] Driver:
>>> HDF5Image/HDF5 Dataset Files: none associated Size is 1120, 760
>>> Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984",
>>> SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]],
>>> TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6326"]],
>>> PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
>>> UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9108"]],
>>> AXIS["Lat",NORTH], AXIS["Long",EAST], AUTHORITY["EPSG","4326"]]
>>> Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 760.0)
>>> Upper Right ( 1120.0, 0.0) Lower Right ( 1120.0, 760.0) Center (
>>> 560.0, 380.0) Band 1 Block=1120x1 Type=Float32, ColorInterp=Undefined
>>> Metadata: data[00]:Description=Ice Conc ---<--------------------cut
>>> here---------------end--------------------->---
>
>>> but readGDAL() apparently read the dimensions in the opposite order:
>
>> No, exactly as the GDAL driver does, 760 rows and 1120 columns, left
>> is 0, right is 1120, upper is 0, lower is 760.
>
>> upper left (0,0) upper right (1120, 0)
>
>> lower left (0,760) lower right (1120, 760)
>
>> So either the GDAL driver is broken, or the data in the file is not in
>> sync with its metadata.
>
>> So far I get your figure by molesting the input metadata:
>
>> ice <- readGDAL("HDF5:\"conc_200901011200.hdf\"://Data/data[00]",
>> p4s=as.character(NA)) is.na(ice$band1) <- ice$band1 < 0 vice <-
>> ice$band1 mice <- matrix(vice, ncol=760, byrow=TRUE)
>> image(t(mice[1120:1,]), asp=1)
>
>> As I said before, the input HDF5 file has a completely wrong
>> description of its own projection, so my conclusion would be that it
>> is so badly configured as not to be usable in a portable way, since
>> both its geometry and projection are declared in error.
>
>> Hope this helps,
>
>> Roger
>
>
>
>>> ---<--------------------cut
>>> here---------------start------------------->---
> R> summary(ice)
>>> Object of class SpatialGridDataFrame Coordinates: min max x 0 1120 y
>>> 0 760 Is projected: NA proj4string : [NA] Number of points: 2 Grid
>>> attributes: cellcentre.offset cellsize cells.dim x 0.5 1 1120 y 0.5 1
>>> 760 Data attributes: Min. 1st Qu.  Median Mean 3rd Qu.  Max.  -32800
>>> -99 -99 -117 0 100 ---<--------------------cut
>>> here---------------end--------------------->---
>
>>> Does this make sense? and could this be the problem?  The data that
>>> are read into the `ice' object do look ok via summary(ice), but the
>>> way they're mapped onto the grid does not.
>
>
>>> Cheers,
>
>
>
>> -- Roger Bivand Economic Geography Section, Department of Economics,
>> Norwegian School of Economics and Business Administration, Helleveien
>> 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>
>
>
> Cheers,
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list