Notice: We are currently performing maintenance and improvements on the Backend service. You may experience intermittent slow responses or minor issues. Rest assured, our team is working hard to enhance your experience. Thank you for your patience!
The hdar
R package provides seamless access to the WEkEO
Harmonised Data Access (HDA) API, enabling users to efficiently query,
download, and process data from the HDA platform.
To utilize the HDA service and library, you must first register for a WEkEO account. The HDA service is available at no cost to all WEkEO users. Creating an account allows you full access to our services, ensuring you can leverage the full capabilities of HDA seamlessly. Registration is straightforward and can be completed through the following link: Register for WEkEO. Once your account is set up, you will be able to access the HDA services immediately.
To start using the hdar
package, you first need to
install and load it in your R environment.
To interact with the HDA service, you need to authenticate by
providing your username and password. The Client
class
allows you to pass these credentials directly and optionally save them
to a configuration file for future use. If credentials are not specified
as parameters, the client will read them from the ~/.hdarc
file.
You can create an instance of the Client
class by
passing your username and password directly. TThe initialization method
has an optional parameter save_credentials
that specifies
whether the provided credentials should be saved in the
~/.hdarc
configuration file. By default,
save_credential
s is set to FALSE
.
Here is an example of how to authenticate by passing the user and password, and optionally saving these credentials:
# Define your username and password
username <- "your_username"
password <- "your_password"
# Create an instance of the Client class and save credentials to a config file
# The save_credentials parameter is optional and defaults to FALSE
client <- Client$new(username, password, save_credentials = TRUE)
If the save_credentials
parameter is set to
TRUE
, the credentials will be saved in the
~/.hdarc
file, making it easier to authenticate in future
sessions without passing the credentials again.
Once the client is created, you can check if it has been
authenticated properly by calling a method token()
that
verifies authentication. For example:
By using one of these methods, you can securely authenticate with the HDA service and start making requests.
To interact with the HDA service, you will often need to find
datasets available on WEkEO. The Client class provides a method called
datasets
that lists available datasets, optionally filtered
by a text pattern.
The basic usage of the datasets method is straightforward. You can
retrieve a list of all datasets available on WEkEO by calling the
datasets
method on an instance of the Client
class.
You can also filter the datasets by providing a text pattern. This is useful when you are looking for datasets that match a specific keyword or phrase.
# Assuming 'client' is already created and authenticated
# client <- Client$new()
pattern <- "Seasonal Trajectories"
# client <- Client$new()
pattern <- "Seasonal Trajectories"
filtered_datasets <- client$datasets(pattern)
# list dataset IDs
sapply(filtered_datasets,FUN = function(x){x$dataset_id})
[1] "EO:EEA:DAT:CLMS_HRVPP_VPP-LAEA" "EO:EEA:DAT:CLMS_HRVPP_ST"
[3] "EO:EEA:DAT:CLMS_HRVPP_ST-LAEA" "EO:EEA:DAT:CLMS_HRVPP_VPP"
The datasets method returns a list containing datasets and associated information. This information may include dataset names, descriptions, and other metadata.
To search for data in the HDA service, you need to create a query
template. Manually creating a query template can be tedious as it
involves reading documentation and learning about possible parameters.
To simplify this process, the generate_query_template
function is provided to automate the creation of query templates for a
given dataset.
The generate_query_template
function generates a
template of a query for a specified dataset. This function fetches
information about existing parameters, default values, etc., from the
/queryable
endpoint of the HDA service. The
generate_query_template
function generates a template of a
query for a specified dataset. This function fetches information about
existing parameters, default values, etc., from the
/queryable
endpoint of the HDA service.
Here is an example of how to generate a query template for the dataset with the ID “EO:EEA:DAT:CLMS_HRVPP_ST”:
# client <- Client$new()
query_template <- client$generate_query_template("EO:EEA:DAT:CLMS_HRVPP_ST")
query_template
{
"dataset_id": "EO:EEA:DAT:CLMS_HRVPP_ST",
"uid": "__### Value of string type with pattern: [\\w-]+",
"productType": "PPI",
"platformSerialIdentifier": "S2A, S2B",
"tileId": "__### Value of string type with pattern: [\\w-]+",
"productVersion": "__### Value of string type with pattern: [\\w-]+",
"resolution": "10",
"processingDate": "__### Value of string",
"start": "__### Value of string",
"end": "__### Value of string",
"bbox": [
-180,
-90,
180,
90
]
}
You can and should customize the generated query template to fit your
specific needs. Fields starting with __###
are placeholders
indicating possible values. If these placeholders are left unchanged,
they will be automatically removed before sending the query to the HDA
service.
To modify the query, it is often easier to transform the JSON into an
R list using the jsonlite::fromJSON()
function:
# convert to list for easier manipulation in R
library(jsonlite)
query_template <- fromJSON(query_template)
query_template
$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"
$uid
[1] "__### Value of string type with pattern: [\\w-]+"
$productType
[1] "PPI"
$platformSerialIdentifier
[1] "S2A, S2B"
$tileId
[1] "__### Value of string type with pattern: [\\w-]+"
$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"
$resolution
[1] "10"
$processingDate
[1] "__### Value of string"
$start
[1] "__### Value of string"
$end
[1] "__### Value of string"
$bbox
[1] -180 -90 180 90
Here is an example of how to use the query template in a search:
# set a new bbox
query_template$bbox <- c(11.1090, 46.6210, 11.2090, 46.7210)
# limit the time range
query_template$start <- "2018-03-01T00:00:00.000Z"
query_template$end <- "2018-05-31T00:00:00.000Z"
query_template
$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"
$uid
[1] "__### Value of string type with pattern: [\\w-]+"
$productType
[1] "PPI"
$platformSerialIdentifier
[1] "S2A, S2B"
$tileId
[1] "__### Value of string type with pattern: [\\w-]+"
$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"
$resolution
[1] "10"
$processingDate
[1] "__### Value of string"
$start
[1] "2018-03-01T00:00:00.000Z"
$end
[1] "2018-05-31T00:00:00.000Z"
$bbox
[1] 11.109 46.621 11.209 46.721
Once you have made the necessary modifications, you can convert the
list back to JSON format with the jsonlite::toJSON()
function. It’s crucial to use the auto_unbox = TRUE
flag
when converting back to JSON. This ensures that the JSON is correctly
formatted, particularly for arrays with a single element, due to the way
jsonlite
handles serialization.
To search for data in the HDA service, you can use the
search
function provided by the Client class. This function
allows you to search for datasets based on a query and optionally limit
the number of results. The search results can then be downloaded using
the download method of the SearchResults
class.
The search
function takes a query and an optional limit
parameter, which specifies the maximum number of results you want to
retrieve. The function only searches for data and does not download it.
The output of this function is an instance of the
SearchResults
class.
Here is an example of how to search for data using a query and limit the results to 5:
# Assuming 'client' is already created and authenticated, 'query' is defined
matches <- client$search(query_template)
[1] "Found 9 files"
[1] "Total Size 1.8 GB"
sapply(matches$results,FUN = function(x){x$id})
[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI"
The SearchResults
class has a public field
results
and a method called download
that is
responsible for downloading the found data. The download()
function takes an output directory (which is created if it doesn’t
already exist) and includes an optional force
parameter.
When force
is set to TRUE
, the function will
re-download the files even if they already exist in the output
directory, overwriting the existing files. If force
is set
to FALSE
(the default), the function will skip downloading
files that already exist, saving time and bandwidth.
# Assuming 'matches' is an instance of SearchResults obtained from the search
odir <- tempdir()
matches$download(odir)
The total size is 1.8 GB . Do you want to proceed? (Y/N):
y
[1] "[Download] Start"
[1] "[Download] Downloading file 1/9"
[1] "[Download] Downloading file 2/9"
[1] "[Download] Downloading file 3/9"
[1] "[Download] Downloading file 4/9"
[1] "[Download] Downloading file 5/9"
[1] "[Download] Downloading file 6/9"
[1] "[Download] Downloading file 7/9"
[1] "[Download] Downloading file 8/9"
[1] "[Download] Downloading file 9/9"
[1] "[Download] DONE"
# Assuming 'matches' is an instance of SearchResults obtained from the search
list.files(odir)
[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI.tif"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI.tif"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI.tif"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI.tif"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI.tif"
unlink(odir,recursive = TRUE)