--- title: "Downloading from Copernicus Climate Service" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Downloading from Copernicus Climate Service} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) tryCatch({ Sys.setlocale("LC_ALL", "English") }) library(ggplot2) theme_set(theme_light()) ``` ## Introduction When obtaining data from Copernicus Climate Data Service you cannot download the data directly. You need to know which data you want, submit a request for a specific dataset. Wait for your request to complete, and if successful, download the data. The `CopernicusClimate` package has functions to facilitate this process. This vignette will walk you through the different steps to download data. * [Finding datasets](#finding-datasets) * [Requesting a dataset](#specifying-a-request) * [Tracking submitted requests](#tracking-submitted-requests) * [Download data](#downloading-data) But before you can even get started, there are some things you have to prepare first, as explained in the following section. ## Prerequisites ### Access token This R package is built around the Application Programming Interface (API) provided by C3S. Many of the features of this API require to identify yourself, for which a 'key' or API token is used. You can get one by creating an account at . Once you have an account you can generate (or refresh) an API key. You can use this token by means of the `token` argument in many of the functions of this package. But rather then providing the key separately each time, you can use the key throughout your R session by setting it once with `cds_set_token()`. However, if you want to share your work, it is not very secure to keep your strictly personal key hard coded in your script. Furthermore, setting the key with `cds_set_token()` does not persist across sessions. Instead, you could set it as an option in your `.rprofile` file, or as a environmental variable on your system. In both cases the variable should be named `CDSAPI_KEY`. This variable is automatically picked up by `cds_get_token()` and you don't have to specify it anywhere in your script. You can check if your token works with `cds_token_works()`: ```{r token} library(CopernicusClimate) message( "The machine that rendered this vignette ", ifelse( cds_token_works(), "has", "does not have"), " a working token") ``` ### Licences In order to download datasets you need to accept its accompanying licence. You can use `cds_dataset_form()` to inspect under which licence a dataset is provided, like so: ```{r get-licence, message=FALSE} library(dplyr) licence_info <- cds_dataset_form("reanalysis-era5-pressure-levels") |> filter(name == "licences") licence_info <- licence_info$details[[1]]$details$licences[[1]] print(licence_info) ``` You can accept this licence by calling `cds_accept_licence(licence_info$id, licence_info$revision)`. You only need to do this once for every licence. Accepted licences are stored with your account and can be listed with `cds_accepted_licences()`. Without accepting required licences you cannot submit a successful request for downloading it. ## Finding datasets ### Websites If you want a visual interface for exploring available datasets, you can use your web browser and visit either the [Climate Data Store](https://cds.climate.copernicus.eu/datasets) or [STAC catalogue](https://cds.climate.copernicus.eu/stac-browser/). Both allow you to navigate through the treasures of information, and identify which dataset best serves your needs. ### Programatically You can also use this R package to look for datasets. You could start by listing them all: ```{r listing} cds_list_datasets() ``` But you can also look for specific datasets using free search text and / or predefined keywords: ```{r search} cds_search_datasets(search = "rain", keywords = "Temporal coverage: Future") ``` Use `cds_catalogue_vocabulary()` to list available predefined keywords. You will see that either approach results in a `data.frame` with a column named `id`. You can use this `id` to refer to when setting up a request for download. ### Favourite datasets You can also mark your favourite datasets with a star using `cds_assign_star()`. You can get list your favourite datasets with `cds_starred()`. This makes it easier to find datasets you use a lot. You can remove a star with `cds_remove_star()`. ## Specifying a request In many cases you cannot download an entire dataset at once, because it it too large. This means you have to specify a subset that you want to have. ### What are my options? How do you know what options you have to subset a dataset? These options differ for each dataset, so there is no straightforward answer. However, you can inspect what options you have for a specific dataset. You can start by obtaining the `cds_dataset_form()`. ```{r dataset-form} dataset_form <- cds_dataset_form("reanalysis-era5-pressure-levels") dataset_form ``` This results in a `data.frame` listing which aspects of a dataset you can select from. Each row represents an aspect (except for the row with the `name` `"licences"`). The column `details` contains information about the available values. You could for instance look at the possible values for the `pressure_level`: ```{r possible-values} values <- dataset_form |> filter(name == "pressure_level") |> pull("details") values[[1]]$details$values |> unlist() ``` Using this information you can start building your request using `cds_build_request()`. You can start by just specifying your dataset: ```{r full-request} request <- cds_build_request("reanalysis-era5-pressure-levels") summary(request) ``` The function `cds_build_request()` will automatically add all required parameters to the request and fills it with either their default value, if available, or all allowed values otherwise. The request built above will ask for the complete dataset in the default product type, plus data and download format. As I will explain in the following section, this request will fail for most users. So let's narrow it down: ```{r specific-request} request <- cds_build_request( "reanalysis-era5-pressure-levels", variable = "temperature", pressure_level = "1000", year = "2025", month = "01", day = "01", area = c(n = 60, w = -5, e = 10, s = 40), data_format = "netcdf") summary(request) ``` This looks like a reasonable request. ### How much can I get? As mentioned before, the amount of data that can be requested for each download is restricted. In order to test how much a request would cost you can call `cds_estimate_costs()`. Using the example above, if you want to download the full dataset, the estimated costs are as follows: ```{r estimate-full} if (cds_token_works()) { cds_estimate_costs("reanalysis-era5-pressure-levels") } else { message("You need a working token to estimate costs") } ``` In this example the costs exceed the limit, such that this request will fail. If we estimate the costs for the more restricted request, we get: ```{r estimate-detailed} if (cds_token_works()) { cds_estimate_costs( "reanalysis-era5-pressure-levels", variable = "temperature", pressure_level = "1000", year = "2025", month = "01", day = "01", area = c(n = 60, w = -5, e = 10, s = 40), data_format = "netcdf") } else { message("You need a working token to estimate costs") } ``` This is a request that we can afford. ## Submitting a request Once you have established which dataset you want to download and how you wish to subset it, you can submit a request to C3S. Let's submit the request as shown above: ```{r submit, message=FALSE} if (cds_token_works()) { job <- cds_submit_job( "reanalysis-era5-pressure-levels", variable = "temperature", pressure_level = "1000", year = "2025", month = "01", day = "01", area = c(n = 60, w = -5, e = 10, s = 40), data_format = "netcdf") job } else { message("You need a working token to submit a request") } ``` By default this function will wait until the request has been processed by C3S. But when you set the argument `wait = FALSE`, the function will return immediately. In that case, you can submit multiple jobs where you don't have to wait for each individual request to complete. ## Tracking submitted requests When submitting a request and choose not to wait for it to complete, you may want to track the progress of your request. You can use `cds_list_jobs()` to list all your submitted jobs. If you want the status of a specific job, you can use its identifier (id). You were sent this id when you submitted it earlier. So we can have a look at the status of our job submitted above: ```{r job-status} if (cds_token_works()) { cds_list_jobs(job$jobID) } else { message("You need a working token to get a job status") } ``` ## Downloading data Now that we have submitted the request we can download it (if it is completed successfully) with `cds_download_jobs()`. If you don't specify a job identifier, it will download all (previously submitted) successful jobs. You can also download one or more specific jobs. Note that this function will use parallel downloads which should give you some performance advantage when downloading multiple jobs. For now let's try to download the submitted job: ```{r download, message=FALSE} filename <- "result.nc" if (cds_token_works()) { file_result <- cds_download_jobs(job$jobID, tempdir(), filename) } else { message("Downloading data only works with a valid token") } ``` Now you can do whatever it is you want to do with the data: ```{r plot, fig.width=7, fig.height=3} fn <- file.path(tempdir(), filename) if (file.exists(fn)) { library(stars) library(ggplot2) result <- read_mdim(fn) ggplot() + geom_stars(data = result) + coord_sf() + facet_wrap(~strftime(valid_time, "%H:%M")) + scale_fill_viridis_c(option = "turbo") + labs(x = NULL, y = NULL, fill = "Temperature [K]") } else { message("File wasn't downloaded") } ```