---
title: "Downloading from Copernicus Climate Service"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Downloading from Copernicus Climate Service}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
tryCatch({
  Sys.setlocale("LC_ALL", "English")
})
library(ggplot2)
theme_set(theme_light())
```

## Introduction

When obtaining data from Copernicus Climate Data Service you cannot download
the data directly. You need to know which data you want, submit a request for
a specific dataset. Wait for your request to complete, and if successful, download
the data. The `CopernicusClimate` package has functions to facilitate this process.
This vignette will walk you through the different steps to download data.

 * [Finding datasets](#finding-datasets)
 * [Requesting a dataset](#specifying-a-request)
 * [Tracking submitted requests](#tracking-submitted-requests)
 * [Download data](#downloading-data)

But before you can even get started, there are some things you have to prepare first,
as explained in the following section.

## Prerequisites

### Access token

This R package is built around the Application Programming Interface (API) provided
by C3S. Many of the features of this API require to identify yourself, for which
a 'key' or API token is used. You can get one by creating an account at
<https://cds.climate.copernicus.eu/profile>.

Once you have an account you can generate (or refresh) an API key. You can use this
token by means of the `token` argument in many of the functions of this package. But
rather then providing the key separately each time, you can use the key throughout
your R session by setting it once with `cds_set_token()`.

However, if you want to share your work, it is not very secure to keep your strictly
personal key hard coded in your script. Furthermore, setting the key with `cds_set_token()`
does not persist across sessions. Instead, you could set it as an option in your
`.rprofile` file, or as a environmental variable on your system. In both cases the
variable should be named `CDSAPI_KEY`. This variable is automatically picked up by
`cds_get_token()` and you don't have to specify it anywhere in your script.

You can check if your token works with `cds_token_works()`:

```{r token}
library(CopernicusClimate)

message(
  "The machine that rendered this vignette ",
  ifelse(
    cds_token_works(), "has", "does not have"),
  " a working token")

```

### Licences

In order to download datasets you need to accept its accompanying licence.
You can use `cds_dataset_form()` to inspect under which licence a dataset
is provided, like so:

```{r get-licence, message=FALSE}
library(dplyr)

licence_info <-
  cds_dataset_form("reanalysis-era5-pressure-levels") |>
    filter(name == "licences")

licence_info <- licence_info$details[[1]]$details$licences[[1]]
print(licence_info)
```

You can accept this licence by calling
`cds_accept_licence(licence_info$id, licence_info$revision)`. You only need to do
this once for every licence. Accepted licences are stored with your account and
can be listed with `cds_accepted_licences()`. Without accepting required licences
you cannot submit a successful request for downloading it.

## Finding datasets

### Websites

If you want a visual interface for exploring available datasets, you can use
your web browser and visit either the
[Climate Data Store](https://cds.climate.copernicus.eu/datasets) or
[STAC catalogue](https://cds.climate.copernicus.eu/stac-browser/).
Both allow you to navigate through the treasures of information, and identify which
dataset best serves your needs.

### Programatically

You can also use this R package to look for datasets. You could start by listing them
all:

```{r listing}
cds_list_datasets()
```

But you can also look for specific datasets using free search text and / or predefined
keywords:

```{r search}
cds_search_datasets(search = "rain", keywords = "Temporal coverage: Future")
```

Use `cds_catalogue_vocabulary()` to list available predefined keywords.

You will see that either approach results in a `data.frame` with a  column named `id`.
You can use this `id` to refer to when setting up a request for download.

### Favourite datasets

You can also mark your favourite datasets with a star using `cds_assign_star()`. You
can get list your favourite datasets with `cds_starred()`. This makes it easier to
find datasets you use a lot. You can remove a star with `cds_remove_star()`.

## Specifying a request

In many cases you cannot download an entire dataset at once, because it it
too large. This means you have to specify a subset that you want to have.

### What are my options?

How do you know what options you have to subset a dataset? These options differ
for each dataset, so there is no straightforward answer. However, you can inspect
what options you have for a specific dataset. You can start by obtaining the
`cds_dataset_form()`.

```{r dataset-form}
dataset_form <-
  cds_dataset_form("reanalysis-era5-pressure-levels")

dataset_form
```

This results in a `data.frame` listing which aspects of a dataset you can
select from. Each row represents an aspect (except for the row with the `name`
`"licences"`). The column `details` contains information about the available
values. You could for instance look at the possible values for the `pressure_level`:

```{r possible-values}
values <-
  dataset_form |>
  filter(name == "pressure_level") |>
  pull("details")

values[[1]]$details$values |> unlist()
```

Using this information you can start building your request using `cds_build_request()`.
You can start by just specifying your dataset:

```{r full-request}
request <- cds_build_request("reanalysis-era5-pressure-levels")
summary(request)
```

The function `cds_build_request()` will automatically add all required parameters
to the request and fills it with either their default value, if available, or all allowed
values otherwise. The request built above will ask for the complete dataset
in the default product type, plus data and download format. As I will explain in the
following section, this request will fail for most users. So let's narrow it down:

```{r specific-request}
request <- cds_build_request(
  "reanalysis-era5-pressure-levels",
  variable       = "temperature",
  pressure_level = "1000",
  year           = "2025",
  month          = "01",
  day            = "01",
  area           = c(n = 60, w = -5, e = 10, s = 40),
  data_format    = "netcdf")
summary(request)
```

This looks like a reasonable request.

### How much can I get?

As mentioned before, the amount of data that can be requested for each
download is restricted. In order to test how much a request would cost you
can call `cds_estimate_costs()`. Using the example above, if you want to
download the full dataset, the estimated costs are as follows:

```{r estimate-full}
if (cds_token_works()) {
  cds_estimate_costs("reanalysis-era5-pressure-levels")
} else {
  message("You need a working token to estimate costs")
}
```

In this example the costs exceed the limit, such that this request
will fail. If we estimate the costs for the more restricted request,
we get:

```{r estimate-detailed}
if (cds_token_works()) {
  cds_estimate_costs(
    "reanalysis-era5-pressure-levels",
    variable       = "temperature",
    pressure_level = "1000",
    year           = "2025",
    month          = "01",
    day            = "01",
    area           = c(n = 60, w = -5, e = 10, s = 40),
    data_format    = "netcdf")
} else {
  message("You need a working token to estimate costs")
}
```

This is a request that we can afford.

## Submitting a request

Once you have established which dataset you want to download and how you wish to subset
it, you can submit a request to C3S. Let's submit the request as shown above:

```{r submit, message=FALSE}
if (cds_token_works()) {
  job <-
    cds_submit_job(
      "reanalysis-era5-pressure-levels",
      variable       = "temperature",
      pressure_level = "1000",
      year           = "2025",
      month          = "01",
      day            = "01",
      area           = c(n = 60, w = -5, e = 10, s = 40),
      data_format    = "netcdf")
  job
} else {
  message("You need a working token to submit a request")
}
```

By default this function will wait until the request has been processed by
C3S. But when you set the argument `wait = FALSE`, the function will
return immediately. In that case, you can submit multiple jobs where you
don't have to wait for each individual request to complete.

## Tracking submitted requests

When submitting a request and choose not to wait for it to complete, you
may want to track the progress of your request. You can use `cds_list_jobs()`
to list all your submitted jobs. If you want the status of a specific job,
you can use its identifier (id). You were sent this id when you submitted it earlier.
So we can have a look at the status of our job submitted above:

```{r job-status}
if (cds_token_works()) {
  cds_list_jobs(job$jobID)
} else {
  message("You need a working token to get a job status")
}
```

## Downloading data

Now that we have submitted the request we can download it (if it is completed
successfully) with `cds_download_jobs()`. If you don't specify a job identifier,
it will download all (previously submitted) successful jobs. You can also
download one or more specific jobs. Note that this function will use parallel downloads
which should give you some performance advantage when downloading multiple jobs.
For now let's try to download the submitted job:

```{r download, message=FALSE}
filename <- "result.nc"
if (cds_token_works()) {
  file_result <- cds_download_jobs(job$jobID, tempdir(), filename)
} else {
  message("Downloading data only works with a valid token")
}
```

Now you can do whatever it is you want to do with the data:

```{r plot, fig.width=7, fig.height=3}
fn <- file.path(tempdir(), filename)

if (file.exists(fn)) {
  
  library(stars)
  library(ggplot2)
  
  result <- read_mdim(fn)
  
  ggplot() +
    geom_stars(data = result) +
    coord_sf() +
    facet_wrap(~strftime(valid_time, "%H:%M")) +
    scale_fill_viridis_c(option = "turbo") +
    labs(x = NULL, y = NULL, fill = "Temperature [K]")

} else {
  message("File wasn't downloaded")
}
```