[R] [ncdf] programmatically copying a netCDF file

Tom Roche Tom_Roche at pobox.com
Tue Jan 10 00:54:43 CET 2012


summary: Programmatically copying NetCDF mostly works: thanks for your
assistance! However, 4 followup questions/responses (and motivation
provided) below regarding problems encountered.

details:

Tom Roche Thu, 05 Jan 2012 18:29:35 -0500
>> I need to "do surgery" on a large netCDF file (technically an
>> I/O API file which uses netCDF).

David William Pierce Thu, 5 Jan 2012 19:49:13 -0800
> simply copying the file generally isn't the point of an R script.

:-) I guess I should have explained: I need to copy most of a source
file, modifying only part, and to write a target file. So my motivation
for this thread is, first, to be sure I can do the copying correctly
(*not* merely to copy an netCDF file). Does this seem reasonable?

> If I wanted to copy a var from an existing file to a new file,
> manipulating it along the way, I'd do something like this (untested
> code off the top of my head):

see
https://stat.ethz.ch/pipermail/r-help/attachments/20120105/f7644171/attachment.pl

> Hope that gets you started,

I had already started, but much less programmatically: I was using the
ncdf API, but with names and indices copied from `ncdump -h` or
`summary.ncdf`. Your code is better! at least, much less error-prone.
(Although too terse for this R newbie: I rewrote more verbosely.)
However I am noticing a few problems, for which I'd appreciate help if
available (or correction if invalid, else a pointer to bug reporting):

1 Precisions "int" and "float" not supported by var.def.ncdf(...).
  When I tried to do (formatted for email)

  target.datavars[[target.datavars.i]] <-
    var.def.ncdf(source.datavar$name, source.datavar$units, 
    target.datavar.dims, source.datavar$missval, 
->  prec=source.datavar$prec)

  I got

- var.def.ncdf: error: unknown precision specified: int .
- Known values: short single double integer char byte

  and similarly for precision="float". So I wrote a kludge function

+ precConvert <- function(prec.in) {
+   ret = switch(prec.in,
+     'byte'='byte',
+     'char'='char',
+     'double'='double',
+     'float'='single',
+     'int'='integer',
+     'integer'='integer',
+     'short'='short',
+     'single'='single',
+   )
+ }

  and successfully did

  target.datavars[[target.datavars.i]] <-
    var.def.ncdf(source.datavar$name, source.datavar$units,
    target.datavar.dims, source.datavar$missval,
+>  prec=precConvert(source.datavar$prec))

  Should this "just work"?

2 Copying I/O API global attributes fails. I/O API uses lots of these
  (33 in my source.nc!), so my diff has

-// global attributes:
-               :IOAPI_VERSION = "1.0 1997349 (Dec. 15, 1997)" ;
-               :EXEC_ID = "????????????????                 " ;
-               :FTYPE = 1 ;
-               :CDATE = 2011353 ;
-               :CTIME = 1224 ;
...

  However when I do

> global.attr.name.list <- list(
+   ":IOAPI_VERSION",
+   ":EXEC_ID",
+   ":FTYPE",
+   ":CDATE",
+   ":CTIME",
...
+ )
> for (attr.name in global.attr.name.list) {
+   source.datavar.attr <- att.get.ncdf(source.file, 0, attr.name)
+   att.put.ncdf(target.file, 0, attr.name, source.datavar.attr$value)
+ }

  I get (lines broken for email)

- Error in R_nc_put_att_double: 
-   NetCDF: Name contains illegal characters
- [1] "Error in att.put.ncdf, while writing attribute :IOAPI_VERSION
-    with value 0"
- Error in att.put.ncdf(target.file, 0, attr.name,
-   source.datavar.attr$value) : 
-   Error return from C call R_nc_put_att_double for attribute
-     :IOAPI_VERSION

  Is my code, ncdf, I/O API, or Something Completely Different
  causing this error?

3 When I diff my `ncdump`s, i.e.,

$ diff -uwB  <( ncdump -h source.nc ) <( ncdump -h target.nc )

  I get

> --- /dev/fd/63  2012-01-09 17:20:30.258837803 -0500
> +++ /dev/fd/62  2012-01-09 17:20:30.258837803 -0500
> @@ -1,194 +1,29 @@
> -netcdf \5yravg.test {
> +netcdf \5yravg.onlyOrigDN2 {
>  dimensions:
> -       TSTEP = UNLIMITED ; // (1 currently)
>         DATE-TIME = 2 ;
> -       LAY = 42 ;
>         VAR = 29 ;
> -       ROW = 299 ;
> +       TSTEP = UNLIMITED ; // (1 currently)
>         COL = 459 ;
> +       ROW = 299 ;
> +       LAY = 42 ;
>  variables:
> +       int DATE-TIME(DATE-TIME) ;
> +               DATE-TIME:units = "" ;
> +       int VAR(VAR) ;
> +               VAR:units = "" ;
> +       int TSTEP(TSTEP) ;
> +               TSTEP:units = "" ;

  Reordering the dimensions I can live with: what annoys/confuses me is

* the target file has *new* coordinate variables for the dimensions.

* I don't understand why those coordinate variables weren't in the
  source file. But they're not!

  (Note I also get new data variables for dims={COL, LAY, ROW}, farther
  down the diff.) To clarify, e.g.: there is no variable

int DATE-TIME(DATE-TIME) ;

  in the source file.

4 Attribute="long_name" is missing for every original/copied data
  variable. Hence when I diff my `ncdump`s I also get, e.g.,

        int TFLAG(TSTEP, VAR, DATE-TIME) ;
                TFLAG:units = "<YYYYDDD,HHMMSS>" ;
-               TFLAG:long_name = "TFLAG           " ;
                TFLAG:var_desc = "..."

  How to fix or workaround? Note that others have previously written

http://www.image.ucar.edu/Software/Netcdf/
> I believe there is a bug in the ncdf library which is causing the
> longname attribute to be ignored.

Your assistance is appreciated! and if I should submit patches or bug
reports somewhere, please let me know.

HTH, Tom Roche <Tom_Roche at pobox.com>



More information about the R-help mailing list