[R] Problem with converting grib file to excel
Roy Mendelssohn - NOAA Federal
roy@mende|@@ohn @end|ng |rom no@@@gov
Tue Oct 1 01:52:18 CEST 2024
I have corresponded with Javad off-line, posting this as a follow-up, to close the issue. There are two separate questions here. The first is why did the posted code below fail. The second is there an easy way to read in the values, given the oddities of his file (more on that), and yes it turns out "terra" performs much better than "raster".
First a little about GRIB files in general, and this file in particular. GRIB files were design to store grids in files with very small footprints, usually from model output. GRIB files have the characteristic that if I "cat" a bunch of GRIB files, I still have a valid GRIB file. So a given GRIB file often contains multiple parameters, and each of those through time. To translate into terms used by R spatial packages, each variable grid, at each time period will be seen as a "band" or a "layer" in the dataset. Thus if I have 3 parameters, say temperature, dew point and cloud cover at 8670 time periods, R spatial packages will see 3*8670 layers or bands.
However this GRIB file is clearly not a raw GRIB file made by a data provider, but rather an extract made with some tool. In this file there are 3 underlying time series. temperature, dew point and cloud cover, each of dimension (1, 1, 8670), that is because it is defined at one spatial point only.
So why did the code below fail? raster::stack() read the file just fine, and the layer_names are correct, all 3*8670 of them. The next steps:
>> # Extract layers based on layer names - adjust as necessary
>> t2m <- raster_data[[grep("t2m", layer_names)]]
>> d2m <- raster_data[[grep("d2m", layer_names)]]
>> tcc <- raster_data[[grep("tcc", layer_names)]]
>> valid_time <- raster_data[[grep("valid_time", layer_names)]]
>> t2m
which are aimed at subsetting the raster stack based on the variable name all fail because ""t2m" , "d2m" and "tcc" are not in the layer_names. So then of course everything else fails.
So the next question is how to read the file and get a data frame. A first pass would be:
raster_stack <- raster::stack("Met.grib")
raster_data <- raster::getValues(raster_stack).
but you do not want to do that. For whatever reason, the raster_stack created above is enormous, and the raster::getValues() takes forever, I aborted after about an hour. However, if you use 'terra" instead:
raster_stack <-terra::rast("Met.grib")
raster_data <-terra::values(raster_stack).
it finishes in a flash. raster_data is now one long array that needs to be reshaped:
raster_data <- array(raster_data, dim = c(3, 8670)
now raster_data[1, ] contains the temperature series, raster_data[2, ] contains the dew point data, and raster_data[3, ] contains the cloud cover data.
HTH,
-Roy
PS - GRIB files can be a bear. The best GRIB readers in R are just front ends for either eccodes or wgrib2, and while they work very well, installation for eccodes and wgrib2 can be non-trivial, so I didn't want to use them as a solution. Also for quick viewing of GRIB files there is NASA's Panoply (https://www.giss.nasa.gov/tools/panoply/) but that requires a Java installation.
> On Sep 23, 2024, at 11:31 PM, javad bayat <j.bayat194 using gmail.com> wrote:
>
> Dear R users;
> I have downloaded a grib file format (Met.grib) and I want to export its
> data to excel file. Also I want to do some mathematic on some columns. But
> I got error. I would be more than happy if anyone can help me to do this. I
> have provided the codes and the Met.grib file in this email.
> Sincerely yours
>
> # Load the necessary libraries
>> library(raster) # For reading GRIB files
>> library(dplyr) # For data manipulation
>> library(lubridate) # For date manipulation
>> library(openxlsx) # For writing Excel files
>
> # Specify the file paths
>> grib_file_path <- "C:/Users/Omrab_Lab/Downloads/Met.grib"
>> excel_file_path <- "C:/Users/Omrab_Lab/Downloads/Met_updated.xlsx"
>
> # Open the GRIB file
>> raster_data <- stack(grib_file_path)
>
> # Check the names of the layers to identify which ones to extract
>> layer_names <- names(raster_data)
>> print(layer_names) # Prints
>
>
>> # Extract layers based on layer names - adjust as necessary
>> t2m <- raster_data[[grep("t2m", layer_names)]]
>> d2m <- raster_data[[grep("d2m", layer_names)]]
>> tcc <- raster_data[[grep("tcc", layer_names)]]
>> valid_time <- raster_data[[grep("valid_time", layer_names)]]
>> t2m
> class : RasterStack
> nlayers : 0
>
>> # Check if the raster layers are loaded correctly
>> if (is.null(t2m) || is.null(d2m) || is.null(tcc) || is.null(valid_time))
> {
> + stop("One or more raster layers could not be loaded. Please check the
> layer names.")
> + }
>
>> # Convert raster values to vectors
>> t2m_values <- values(t2m)
> Error in dimnames(x) <- dn :
> length of 'dimnames' [2] not equal to array extent
>> d2m_values <- values(d2m)
> Error in dimnames(x) <- dn :
> length of 'dimnames' [2] not equal to array extent
>> tcc_values <- values(tcc)
> Error in dimnames(x) <- dn :
> length of 'dimnames' [2] not equal to array extent
>> valid_time_values <- values(valid_time)
> Error in dimnames(x) <- dn :
> length of 'dimnames' [2] not equal to array extent
>
> # Check for NA values and dimensions
> if (any(is.na(t2m_values)) || any(is.na(d2m_values)) || any(is.na(tcc_values))
> || any(is.na(valid_time_values))) {
> warning("One or more layers contain NA values. These will be removed.")
> }
>
> # Create the data frame, ensuring no NA values are included
> df <- data.frame(
> t2m = t2m_values,
> d2m = d2m_values,
> tcc = tcc_values,
> valid_time = valid_time_values,
> stringsAsFactors = FALSE
> )
>
> # Remove rows with NA values
> df <- na.omit(df)
>
> # Convert temperatures from Kelvin to Celsius
> df$t2m <- df$t2m - 273.15
> df$d2m <- df$d2m - 273.15
>
> # Calculate relative humidity
> calculate_relative_humidity <- function(t2m, d2m) {
> es <- 6.112 * exp((17.67 * t2m) / (t2m + 243.5))
> e <- 6.112 * exp((17.67 * d2m) / (d2m + 243.5))
> rh <- (e / es) * 100
> return(rh)
> }
> df$RH <- calculate_relative_humidity(df$t2m, df$d2m)
>
> # Convert valid_time from numeric to POSIXct assuming it's in seconds since
> the epoch
> df$valid_time <- as.POSIXct(df$valid_time, origin = "1970-01-01")
>
> # Extract year, month, day, and hour from valid_time
> df$Year <- year(df$valid_time)
> df$Month <- month(df$valid_time)
> df$Day <- day(df$valid_time)
> df$Hour <- hour(df$valid_time)
>
> # Select only the desired columns
> df_selected <- df %>% select(Year, Month, Day, Hour, tcc, t2m, RH)
>
> # Save the updated DataFrame to an Excel file
> write.xlsx(df_selected, excel_file_path, row.names = FALSE)
>
>
>
>
>
>
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail: bayat194 using yahoo.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Mendelssohn using noaa.gov www: https://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
More information about the R-help
mailing list