---
title: rerddap introduction
author: Scott Chamberlain
date: "2023-06-29"
output: rmarkdown::html_vignette
vignette: >
    %\VignetteIndexEntry{rerddap introduction}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---




`rerddap` is a general purpose R client for working with ERDDAP™ servers. ERDDAP™ is a server built on top of OPenDAP, which serves some NOAA data. You can get gridded data ([griddap](https://upwell.pfeg.noaa.gov/erddap/griddap/documentation.html)), which lets you query from gridded datasets, or table data ([tabledap](https://upwell.pfeg.noaa.gov/erddap/tabledap/documentation.html)) which lets you query from tabular datasets. In terms of how we interface with them, there are similarties, but some differences too. We try to make a similar interface to both data types in `rerddap`.

## NetCDF

`rerddap` supports NetCDF format, and is the default when using the `griddap()` function. NetCDF is a binary file format, and will have a much smaller footprint on your disk than csv. The binary file format means it's harder to inspect, but the `ncdf4` package makes it easy to pull data out and write data back into a NetCDF file. Note the the file extension for NetCDF files is `.nc`. Whether you choose NetCDF or csv for small files won't make much of a difference, but will with large files.

## Caching

Data files downloaded are cached in a single hidden directory `~/.rerddap` on your machine. It's hidden so that you don't accidentally delete the data, but you can still easily delete the data if you like.

When you use `griddap()` or `tabledap()` functions, we construct a MD5 hash from the base URL, and any query parameters - this way each query is separately cached. Once we have the hash, we look in `~/.rerddap` for a matching hash. If there's a match we use that file on disk - if no match, we make a http request for the data to the ERDDAP™ server you specify.

## ERDDAP™ servers

You can get a data.frame of ERDDAP™ servers using the function `servers()`. The list of ERDDAP™ servers is drawn from the *Awesome ERDDAP™* page maintained by the Irish Marine Institute .  If you know of more ERDDAP™ servers, follow the instructions on that page to add the server.

## Install

Stable version from CRAN


```r
install.packages("rerddap")
```

Or, the development version from GitHub


```r
remotes::install_github("ropensci/rerddap")
```


```r
library("rerddap")
```

## Search

First, you likely want to search for data, specify either `griddadp` or `tabledap`


```r
ed_search(query = 'size', which = "table")
#> # A tibble: 36 × 2
#>    title                                                              dataset_id
#>    <chr>                                                              <chr>     
#>  1 CCE Prey Size and Hard Part Size Regressions                       mmtdPreyS…
#>  2 CCE Teleost Prey Size and Hard Part Size Measurements              mmtdTeleo…
#>  3 CalCOFI Larvae Sizes                                               erdCalCOF…
#>  4 CCE Non-Teleost Prey Size and Hard Part Size Measurements          mmtdNonTe…
#>  5 Channel Islands, Kelp Forest Monitoring, Size and Frequency, Natu… erdCinpKf…
#>  6 File Names from the AWS S3 noaa-goes16 Bucket                      awsS3Noaa…
#>  7 File Names from the AWS S3 noaa-goes17 Bucket                      awsS3Noaa…
#>  8 PacIOOS Beach Camera 001: Waikiki, Oahu, Hawaii                    BEACHCAM-…
#>  9 PacIOOS Beach Camera 003: Waimea Bay, Oahu, Hawaii                 BEACHCAM-…
#> 10 PacIOOS Beach Camera 004: Waimea Bay (Offshore), Oahu, Hawaii      BEACHCAM-…
#> # ℹ 26 more rows
```


```r
ed_search(query = 'size', which = "grid")
#> # A tibble: 103 × 2
#>    title                                                              dataset_id
#>    <chr>                                                              <chr>     
#>  1 Audio data from a local source.                                    testGridW…
#>  2 Main Hawaiian Islands Multibeam Bathymetry Synthesis: 50-m Bathym… hmrg_bath…
#>  3 Main Hawaiian Islands Multibeam Bathymetry Synthesis: 50-m Bathym… hmrg_bath…
#>  4 Coastal Upwelling Transport Index (CUTI), Daily                    erdCUTIda…
#>  5 SST smoothed frontal gradients                                     FRD_SSTgr…
#>  6 Coastal Upwelling Transport Index (CUTI), Monthly                  erdCUTImo…
#>  7 SST smoothed frontal gradients, Lon0360                            FRD_SSTgr…
#>  8 Biologically Effective Upwelling Transport Index (BEUTI), Daily    erdBEUTId…
#>  9 Biologically Effective Upwelling Transport Index (BEUTI), Monthly  erdBEUTIm…
#> 10 Daily averaged and put on grid 4x daily NCEP reanalysis (psi.2012) noaa_psl_…
#> # ℹ 93 more rows
```

There is now a convenience function to search over a list of ERDDAP™ servers,  designed to work with the function `servers()`


```r
server_list <- c(
  emodnet_physics = 'https://erddap.emodnet-physics.eu/erddap/',
  irish_marine_institute = 'https://erddap.marine.ie/erddap/'
)
global_search(query = 'size', server_list, 'griddap')
#>                                                                                                                                                              title
#> 1                                          EMODnet Physics - Total Suspended Matter - GridSeriesObservation - Concentration of total suspended matter - BALTIC SEA
#> 2                                   EMODnet Physics - Total Suspended Matter - GridSeriesObservation - Concentration of total suspended matter - MEDITERRANEAN SEA
#> 3                  EMODnet Physics - Total Suspended Matter - GridSeriesObservation - Concentration of total suspended matter - MEDITERRANEAN SEA - LOW RESOLUTION
#> 4                                                                                                           EMODnet Physics - TEMPERATURE YEARLY RECORDING DENSITY
#> 5                                                                  EMODPACE - Monthly sea level derived from CMEMS-DUACS (DT-2018) satellite altimetry (1993-2019)
#> 6                                           EMODnet Physics - Total Suspended Matter - GridSeriesObservation - Concentration of total suspended matter - NORTH SEA
#> 7                                                       EMODPACE - Absolute sea level trend (1993 – 2019) - derived from CMEMS-DUACS (DT-2018) satellite altimetry
#> 8 EMODPACE - Sea Level monthly mean, EurAsia. This product is based, uses and reprocess the CMEMS product id. SEALEVEL_GLO_PHY_CLIMATE_L4_REP_OBSERVATIONS_008_057
#> 9                                                                                                                                 COMPASS-NEATL Hindcast 2016-2020
#>                                  dataset_id
#> 1                             TSM_BALTICSEA
#> 2                                 TSM_MBSEA
#> 3                   TSM_MBSEA_LOWRESOLUTION
#> 4                           ERD_EP_TEMP_DNS
#> 5 EMODPACE_SLEV_MONTHLY_MEAN_DESEASONALIZED
#> 6                              TSM_NORTHSEA
#> 7                       EMODPACE_SLEV_TREND
#> 8                EMODPACE_SLEV_MONTHLY_MEAN
#> 9               compass_neatl_hindcast_grid
#>                                         url
#> 1 https://erddap.emodnet-physics.eu/erddap/
#> 2 https://erddap.emodnet-physics.eu/erddap/
#> 3 https://erddap.emodnet-physics.eu/erddap/
#> 4 https://erddap.emodnet-physics.eu/erddap/
#> 5 https://erddap.emodnet-physics.eu/erddap/
#> 6 https://erddap.emodnet-physics.eu/erddap/
#> 7 https://erddap.emodnet-physics.eu/erddap/
#> 8 https://erddap.emodnet-physics.eu/erddap/
#> 9          https://erddap.marine.ie/erddap/
```

## Information

Then you can get information on a single dataset


```r
info('erdCalCOFIlrvsiz')
#> <ERDDAP info> erdCalCOFIlrvsiz 
#>  Base URL: https://upwell.pfeg.noaa.gov/erddap 
#>  Dataset Type: tabledap 
#>  Variables:  
#>      calcofi_species_code: 
#>          Range: 19, 946 
#>      common_name: 
#>      cruise: 
#>      itis_tsn: 
#>      larvae_10m2: 
...
```

## griddap (gridded) data

First, get information on a dataset to see time range, lat/long range, and variables.


```r
(out <- info('erdMBchla1day'))
#> <ERDDAP info> erdMBchla1day 
#>  Base URL: https://upwell.pfeg.noaa.gov/erddap 
#>  Dataset Type: griddap 
#>  Dimensions (range):  
#>      time: (2006-01-01T12:00:00Z, 2023-06-27T12:00:00Z) 
#>      altitude: (0.0, 0.0) 
#>      latitude: (-45.0, 65.0) 
#>      longitude: (120.0, 320.0) 
#>  Variables:  
#>      chlorophyll: 
#>          Units: mg m-3
```

Then query for gridded data using the `griddap()` function


```r
(res <- griddap(out,
  time = c('2015-01-01','2015-01-03'),
  latitude = c(14, 15),
  longitude = c(125, 126)
))
#> <ERDDAP griddap> erdMBchla1day
#>    Path: [/var/folders/xw/mcmsdzzx4mgbttplylgs7ysh0000gp/T//RtmpoME6FV/R/rerddap/4d844aa48552049c3717ac94ced5f9b8.nc]
#>    Last updated: [2023-06-29 13:18:53.945082]
#>    File size:    [0.03 mb]
#>    Dimensions (dims/vars):   [4 X 1]
#>    Dim names: time, altitude, latitude, longitude
#>    Variable names: Chlorophyll Concentration in Sea Water
#>    data.frame (rows/columns):   [5043 X 5]
#> # A tibble: 5,043 × 5
#>    longitude  latitude  altitude time                 chlorophyll
#>    <dbl[1d]> <dbl[1d]> <dbl[1d]> <chr>                      <dbl>
#>  1      125         14         0 2015-01-01T12:00:00Z          NA
#>  2      125.        14         0 2015-01-01T12:00:00Z          NA
#>  3      125.        14         0 2015-01-01T12:00:00Z          NA
#>  4      125.        14         0 2015-01-01T12:00:00Z          NA
#>  5      125.        14         0 2015-01-01T12:00:00Z          NA
#>  6      125.        14         0 2015-01-01T12:00:00Z          NA
#>  7      125.        14         0 2015-01-01T12:00:00Z          NA
#>  8      125.        14         0 2015-01-01T12:00:00Z          NA
#>  9      125.        14         0 2015-01-01T12:00:00Z          NA
#> 10      125.        14         0 2015-01-01T12:00:00Z          NA
#> # ℹ 5,033 more rows
```

The output of `griddap()` is a list that you can explore further. Get the summary


```r
res$summary
#> $filename
#> [1] "/var/folders/xw/mcmsdzzx4mgbttplylgs7ysh0000gp/T//RtmpoME6FV/R/rerddap/4d844aa48552049c3717ac94ced5f9b8.nc"
#> 
#> $writable
#> [1] FALSE
#> 
#> $id
#> [1] 65536
#> 
#> $error
#> [1] FALSE
#> 
#> $safemode
#> [1] FALSE
#> 
...
```

Get the dimension variables


```r
names(res$summary$dim)
#> [1] "time"      "altitude"  "latitude"  "longitude"
```

Get the data.frame (beware: you may want to just look at the `head` of the data.frame if large)


```r
head(res$data)
#>   longitude latitude altitude                 time chlorophyll
#> 1   125.000       14        0 2015-01-01T12:00:00Z          NA
#> 2   125.025       14        0 2015-01-01T12:00:00Z          NA
#> 3   125.050       14        0 2015-01-01T12:00:00Z          NA
#> 4   125.075       14        0 2015-01-01T12:00:00Z          NA
#> 5   125.100       14        0 2015-01-01T12:00:00Z          NA
#> 6   125.125       14        0 2015-01-01T12:00:00Z          NA
```

## tabledap (tabular) data


```r
(out <- info('erdCalCOFIlrvsiz'))
#> <ERDDAP info> erdCalCOFIlrvsiz 
#>  Base URL: https://upwell.pfeg.noaa.gov/erddap 
#>  Dataset Type: tabledap 
#>  Variables:  
#>      calcofi_species_code: 
#>          Range: 19, 946 
#>      common_name: 
#>      cruise: 
#>      itis_tsn: 
#>      larvae_10m2: 
...
```


```r
(dat <- tabledap('erdCalCOFIlrvsiz', fields=c('latitude','longitude','larvae_size',
  'scientific_name'), 'time>=2011-01-01', 'time<=2011-12-31'))
#> <ERDDAP tabledap> erdCalCOFIlrvsiz
#>    Path: [/var/folders/xw/mcmsdzzx4mgbttplylgs7ysh0000gp/T//RtmpoME6FV/R/rerddap/db7389db5b5b0ed9c426d5c13bc43d18.csv]
#>    Last updated: [2023-06-29 13:18:57.579066]
#>    File size:    [0.05 mb]
#> # A tibble: 1,304 × 4
#>    latitude  longitude  larvae_size scientific_name     
#>    <chr>     <chr>      <chr>       <chr>               
#>  1 32.956665 -117.305   4.5         Engraulis mordax    
#>  2 32.91     -117.4     5.0         Merluccius productus
#>  3 32.511665 -118.21167 2.0         Merluccius productus
#>  4 32.511665 -118.21167 3.0         Merluccius productus
#>  5 32.511665 -118.21167 5.5         Merluccius productus
#>  6 32.511665 -118.21167 6.0         Merluccius productus
#>  7 32.511665 -118.21167 2.8         Merluccius productus
#>  8 32.511665 -118.21167 3.0         Sardinops sagax     
#>  9 32.511665 -118.21167 5.0         Sardinops sagax     
#> 10 32.511665 -118.21167 2.5         Engraulis mordax    
#> # ℹ 1,294 more rows
```

Since both `griddap()` and `tabledap()` give back data.frame's, it's easy to do downstream manipulation. For example, we can use `dplyr` to filter, summarize, group, and sort:


```r
library("dplyr")
dat$larvae_size <- as.numeric(dat$larvae_size)
dat %>%
  group_by(scientific_name) %>%
  summarise(mean_size = mean(larvae_size)) %>%
  arrange(desc(mean_size))
#> # A tibble: 7 × 2
#>   scientific_name       mean_size
#>   <chr>                     <dbl>
#> 1 Anoplopoma fimbria        23.3 
#> 2 Engraulis mordax           9.26
#> 3 Sardinops sagax            7.28
#> 4 Merluccius productus       5.48
#> 5 Tactostoma macropus        5   
#> 6 Scomber japonicus          3.4 
#> 7 Trachurus symmetricus      3.29
```