A multiproxy calibration dataset to estimate PDFs from a global collection of geolocalised presence-only data (hereafter proxy distributions) was first presented in . These data were obtained from the Global Biodiversity Information Facility (GBIF) database, an online collection of geolocalised observations of biological entities. The calibration dataset (hereafter gbif4crest) contains the species distributions of six common palaeoecological fossil: the five taxa presented in the original version of the dataset — plants [2-12] for fossil pollen and macrofossils, chironomids , beetles , diatoms  and foraminifera  – to which rodents  were recently added (Fig. 1).
The coordinates of all the presence records of these six common
palaeoecological fossil proxies were upscaled at a spatial resolution of
0.25 x 0.25° (hereafter QDGC for Quarter-Degree Grid Cell) and
subsequently associated with terrestrial and oceanic environmental
variables at the same resolution [18-24] (see details in Table 1).
The QDGC spatial resolution is an empirical trade-off between numerous
factors, including the resolution of the presence data, the quality of
the data or the spatial representativity of the studied proxy. However,
this tradeoff may be suboptimal in some situations, and for that reason,
crestr can also be used with the raw GBIF data and even
alternative calibration datasets.
In its current version (V2), the gbif4crest calibration
dataset contains about 25.3 million unique presence data for the six
proxies. Unfortunately, the density of available data varies strongly
between proxies and regions (Fig. 1). Plant data
dominate the calibration dataset (>22 million unique occurrences) and
allow for the use of
crestr across all landmasses where
vegetation currently grows. For the five other proxies, the datasets are
still incomplete in many regions, restricting the use of
crestr (e.g. chironomids). However, these datasets
are regularly updated by GBIF. For example, the first version of the
gbif4crest dataset released in 2018 contained about 17.5
million QDGC entries, but the new version presented here contains nearly
25.3 million entries (~44% increase). The range of ‘reconstructible’
areas is thus rapidly broadening (see, for instance, the coverage of
Russia by plant data compared to the first version of the
gbif4crest dataset .
Table 1 List of terrestrial and marine variables available in the gbif4crest database. Each one can be selected in crestr using its associated code. List of abbreviations: (Temp.) Temperature, (Precip.) Precipitation, (SST) Sea Surface Temperature, (SSS) Sea Surface Salinity.
|bio1||Mean Annual Temp. (°C)|||
|bio2||Mean Diurnal Range (°C)|||
|bio4||Temp. Seasonality (standard deviation x100) (°C)|||
|bio5||Max Temp. of the Warmest Month (°C)|||
|bio6||Min Temp. of the Coldest Month (°C)|||
|bio7||Temp. Annual Range (°C)|||
|bio8||Mean Temp. of the Wettest Quarter (°C)|||
|bio9||Mean Temp. of the Driest Quarter (°C)|||
|bio10||Mean Temp. of the Warmest Quarter (°C)|||
|bio11||Mean Temp. of the Coldest Quarter (°C)|||
|bio12||Annual precip. (mm)|||
|bio13||Precip. of the Wettest Month (mm)|||
|bio14||Precip. of the Driest Month (mm)|||
|bio15||Precip. Seasonality (Coefficient of Variation) (mm)|||
|bio16||Precip. of the Wettest Quarter (mm)|||
|bio17||Precip. of the Driest Quarter (mm)|||
|bio18||Precip. of the Warmest Quarter (mm)|||
|bio19||Precip. of the Coldest Quarter (mm)|||
|ai||Aridity Index (unitless)|||
|sst_ann||Mean Annual SST (°C)|||
|sst_jfm||Mean Winter SST (°C)|||
|sst_amj||Mean Spring SST (°C)|||
|sst_jas||Mean Summer SST (°C)|||
|sst_ond||Mean Fall SST (°C)|||
|sss_ann||Mean Annual SSS (PSU)|||
|sss_jfm||Mean Winter SSS (PSU)|||
|sss_amj||Mean Spring SSS (PSU)|||
|sss_jas||Mean Summer SSS (PSU)|||
|sss_ond||Mean Fall SSS (PSU)|||
|diss_oxy||Dissolved Oxygen Concentration (mol/L)|||
|nitrate||Nitrate Concentration (mol/L)|||
|phosphate||Phosphate Concentration (mol/L)|||
|silicate||Silicate Concentration (mol/L)|||
|icec_ann||Mean Annual Sea Ice Concentration (%)|||
|icec_jfm||Mean Winter Sea Ice Concentration (%)|||
|icec_amj||Mean Spring Sea Ice Concentration (%)|||
|icec_jas||Mean Summer Ice Concentration (%)|||
|icec_ond||Mean Fall Sea Ice Concentration (%)|||
All these data were curated in a relational database to ensure the
consistency of the data (Fig. 2). The
gbif4crest database is composed of three main types of data:
taxonomic data (
TAXA table on Fig. 2),
distribution data (
tables) and diverse geopolitical, climatological and environmental data
DATA_QDGC table). Its structure is slightly different from
the first version, with a grouping of all the distinct QDGC tables in a
DATA_QDGC table to enable a faster data extraction.
Additional environmental and geographical descriptors were added to
characterise each grid cell and enable a more refined data selection.
These include elevation and elevation variability , the country (www.naturalearthdata.com) or ocean (www.marineregions.org) names, as well as different
levels of ecological classification for the terrestrial  and marine 
realms. The first and last observation dates are also now included,
along with the type of observation, as reported by GBIF (see
DISTRIB_QDGC table on Fig. 2). Finally,
DATA table was entirely recalculated using a new
protocol that better accounts for coastal margins. Climate values at
some locations are thus expected to be slightly different from the first
version of the gbif4crest dataset.
Due to its large size (about 15 Gb), this database is not downloaded
when installing the package, but it can be assessed differently. First,
the data are stored in an open-access, cloud-based PostgreSQL database
that can be dynamically accessed via
crestr. This is the
recommended option, as users without any a priori SQL knowledge
can benefit from the package’s interface to automatically query the
database simply by providing study-specific parameters, such as the name
of the taxa or boundaries for the study area, to import all the
necessary data in the correct format to the R environment. Second,
advanced users can also directly query the database to extract and
curate data from the
tables using the dbRequest()
function, and subsequently associate these data with climate variables.
Finally, the complete gbif4crest calibration dataset can also
be downloaded as a SQLite3 portable database file from here.
 Chevalier, M., 2019. Enabling possibilities to quantify past climate from fossil assemblages at a global scale. Global and Planetary Change, 175, pp. 27–35. doi:10.1016/j.earscirev.2020.103384.
 GBIF, 2020, Anthocerotopsida occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.t9zenf.
 GBIF, 2021, Bryophyta occurrence data downloaded on August 2nd, 2021. doi:10.15468/DL.WD527G.
 GBIF, 2020, Cycadopsidae occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.sfjzxu.
 GBIF, 2020, Gingkoopsidae occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.da9wz8.
 GBIF, 2020, Gnetopsidae occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.h2kjnc.
 GBIF, 2020, Liliopsida occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.axv3yd.
 GBIF, 2020, Lycopodiopsida occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.ydhyhz.
 GBIF, 2020, Magnoliopsida occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.ra49dt.
 GBIF, 2021, Marchantiophyta occurrence data downloaded on August 2nd, 2021. doi:10.15468/DL.M2SSE4.
 GBIF, 2020, Pinopsidae occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.x2r7pa.
 GBIF, 2020, Polypodiopsida occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.87tbp6.
 GBIF, 2020, Chironomids occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.jv3wsh.
 GBIF, 2020, Beetles occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.nteruy.
 GBIF, 2020, Diatoms occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.vfr257.
 GBIF, 2020, Foraminifera occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.692yg6.
 GBIF, 2020, Rodentia occurrence data downloaded on September 24th, 2020. doi:10.15468/dl.fscw6q.
 Fick, S.E. and Hijmans, R.J., 2017, WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37, pp. 4302–4315. doi:10.1002/joc.5086.
 Zomer, R.J., Trabucco, A., Bossio, D.A. and Verchot, L. V., 2008, Climate change mitigation: A spatial analysis of global land suitability for clean development mechanism afforestation and reforestation. Agriculture, Ecosystems & Environment, 126, pp. 67–80. doi:10.1016/j.agee.2008.01.014.
 Locarnini, R.A., Mishonov, A.V., Baranova, O.K., Boyer, T.P., Zweng, M.M., Garcia, H.E., Reagan, J.R., Seidov, D., Weathers, K.W., Paver, C.R., Smolyar, I.V. and Others, 2019, World ocean atlas 2018, volume 1: Temperature. NOAA Atlas NESDIS 81, pp. 52pp. data access.
 Zweng, M.M., Seidov, D., Boyer, T.P., Locarnini, R.A., Garcia, H.E., Mishonov, A.V., Baranova, O.K., Weathers, K.W., Paver, C.R., Smolyar, I.V. and Others, 2018, World Ocean Atlas 2018, Volume 2: Salinity. NOAA Atlas NESDIS 82, pp. 50pp. data access.
 Garcia, H.E., Weathers, K.W., Paver, C.R., Smolyar, I.V., Boyer, T.P., Locarnini, R.A., Zweng, M.M., Mishonov, A.V., Baranova, O.K., Seidov, D. and Reagan, J.R., 2019, World Ocean Atlas 2018, Volume 3: Dissolved Oxygen, Apparent Oxygen Utilization, and Dissolved Oxygen Saturation.. NOAA Atlas NESDIS 83, pp. 38pp. data access.
 Garcia, H.E., Weathers, K.W., Paver, C.R., Smolyar, I.V., Boyer, T.P., Locarnini, R.A., Zweng, M.M., Mishonov, A.V., Baranova, O.K., Seidov, D., Reagan, J.R. and Others, 2019, World Ocean Atlas 2018. Vol. 4: Dissolved Inorganic Nutrients (phosphate, nitrate and nitrate+nitrite, silicate). NOAA Atlas NESDIS 84, pp. 35pp. data access.
 Reynolds, R.W., Smith, T.M., Liu, C., Chelton, D.B., Casey, K.S. and Schlax, M.G., 2007, Daily high-resolution-blended analyses for sea surface temperature. Journal of Climate, 20, pp. 5473–5496. doi:10.1175/2007JCLI1824.1.
 Amante, C. and Eakins, B.W., 2009, Etopo1 1 Arc-Minute Global Relief Model: Procedures, Data Sources and Analysis. NOAA Technical Memorandum NESDIS NGDC-24. National Geophysical Data Center, NOAA. doi:10.7289/V5C8276M.
 Olson, D.M., Dinerstein, E., Wikramanayake, E.D., Burgess, N.D., Powell, G.V.N., Underwood, E.C., D’amico, J.A., Itoua, I., Strand, H.E., Morrison, J.C., Loucks, C.J., Allnutt, T.F., Ricketts, T.H., Kura, Y., Lamoreux, J.F., Wettengel, W.W., Hedao, P. and Kassem, K.R., 2001, Terrestrial Ecoregions of the World: A New Map of Life on Earth: A new global map of terrestrial ecoregions provides an innovative tool for conserving biodiversity. BioScience, 51, pp. 933. doi:10.1641/0006-3568(2001)051[0933:TEOTWA]2.0.CO;2.
 Costello, M.J., Tsai, P., Wong, P.S., Cheung, A.K.L., Basher, Z. and Chaudhary, C., 2017, Marine biogeographic realms and species endemicity. Nature Communications, 8, pp. 1–9. doi:10.1038/s41467-017-01121-2.