--- title: "3. Find the area of origin and the history of past contacts: RRphylogeography" author: "Alessandro Mondanaro, Silvia Castiglione, Pasquale Raia" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{RRphylogeo} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} --- ```{r include = FALSE} if (!requireNamespace("rmarkdown", quietly = TRUE) || !rmarkdown::pandoc_available()) { warning(call. = FALSE, "Pandoc not found, the vignettes is not built") knitr::knit_exit() } misspacks<-sapply(c("rnaturalearth","ggplot2"),requireNamespace,quietly=TRUE) if(any(!misspacks)){ warning(call. = FALSE,paste(names(misspacks)[which(!misspacks)],collapse=", "), "not found, the vignettes is not built") knitr::knit_exit() } knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning=FALSE, message=FALSE ) require(RRgeo) require(rnaturalearth) require(ggplot2) require(terra) require(sf) options(rmarkdown.html_vignette.check_title = FALSE) load(file.path(dirname(getwd()),"inst","exdata","Ursus_occurrences.Rda")) rast(file.path(dirname(getwd()),"inst","exdata","U.arctos_suitability.tif"))->map1 rast(file.path(dirname(getwd()),"inst","exdata","U.maritimus_suitability.tif"))->map2 ``` ## Index 1. [Introduction](#intro) 2. [Formatting the data](#format) 3. [RRphylogeography](#modeling) 4. [RRphylogeography outputs](#output) ## 1. Introduction {#intro} The function `RRphylogeography` is designed to find the area of origin (AOO) and/or identify suitable area of contact between a pair of target species, taking into account their evolutionary history and environmental preferences. To run `RRphylogeography`, the user must provide the presence datapoints and the habitat suitability (HS) maps for both target species. The outputs of `RRphylogeography` consists of relative probability of occurrence (RPO) maps, which are calculated based on three elements: the habitat suitability values in each grid cell, the kernel density estimations derived from the occurrences, and the spatial distance (inclusive of the path cost for moving) between the most suitable cells of the two target species. The user can customize several parameters in `RRphylogeography` to account for different phylogenetic relationships between the species pair (i.e. whether budding cladogenesis, branching cladogenesis or anagenesis are suspected to have occurred) and to adjust the conductance matrix which affects the difficulty of moving across the landscape. In the following chunks, we provide a detailed walkthrough on how to format data and run `RRphylogeography` to identify the suitable area of contact between *Ursus arctos* and *Ursus maritimus* (Mondanaro et al. 2025) at 35 kya. **Important**: `RRphylogeography` is able to work with the habitat suitability (HS) maps generated by any Species Distribution Models (SDMs) algorithm, as long as they are in the 0-1 range. This flexibility allows users to calibrate their SDMs using statistical/regression models, similarity/envelope models, and machine learning methods, depending on their preferred approach. By accommodating a wide range of modeling techniques, `RRphylogeography` ensures broad applicability across different ecological and evolutionary studies. ## 2. Formatting data {#format} Other than the name of the target species (i.e., `spec1` and `spec2` arguments), `RRphylogeography` requires two mandatory arguments: * `pred`: list of `SpatRaster` objects containing the habitat suitability maps for both species. * `occs`: list of `sf::data.frame` objects containing the presence data for both species. Both lists must be named and ordered according to `spec1` and `spec2`. **Note**: `RRphylogeography` can also process a list of stacked `SpatRaster` objects. This is particularly useful when users have SDMs projected across different time frames as it allows to input the habitat suitability maps all together. ```{r loadoccs1, eval=FALSE} library(RRgeo) library(terra) library(sf) rast(system.file("exdata/U.arctos_suitability.tif", package="RRgeo"))->map1 rast(system.file("exdata/U.maritimus_suitability.tif", package="RRgeo"))->map2 load(system.file("exdata/Ursus_occurrences.RDa", package="RRgeo")) ``` ```{r loadoccs2} list(Ursus_arctos=map1,Ursus_maritimus=map2)->pred list(Ursus_arctos=occs_arctos,Ursus_maritimus=occs_marit)->occs head(occs$Ursus_arctos) ``` The standard format for individual elements of `occs` must include the column containing species occurrence data in binary format ("OBS") and the "geometry" column. Additionally, if available, a column representing the time intervals associated with each species presence may be added ("TIME_factor"). All other columns in the `data.frame` are ignored. ## 3. Run RRphylogeography {#modeling} `RRphylogeography` also accepts: * `th`: threshold value for defining the most suitable cells for both target species. `RRphylogeography` function considers all the habitat suitability values greater than 0 which exceed the `th` quantile as the suitable cells to consider. This step is crucial because the subsequent distance calculations are restricted to these most suitable cells only. The latter procedure relies on the R package * `leastcostpath` (Lewis, 2023). * `mask_for_pred`: a `SpatRaster` object used to define the spatial extent of `RRphylogeography` outputs. * `resistance_map`: a `SpatRaster` object to determine the difficulty of moving across the landscape * `time_col`: the name of the column containing time intervals associated to species presence records. If `time_col` is indicated, the function incorporates time intervals as weights when calculating the kernel density. * `kde_inversion`: `TRUE/FALSE`, this sets how time intervals should be used as weights when calculating the kernel density. This depends on the mode of speciation thought to link `spec1` to `spec2` and the research objective. If the study assumes an anagenetic process linking the target species, the AOO of the descendant species becomes the focus. In this case, setting kde_inversion = TRUE is appropriate as it gives higher weights to younger occurrences of the ancestor species. On the contrary, if the goal is to identify the AOO or the past contact of two coeval species, temporal weights should be considered similar, making `kde_inversion` unnecessary. * `standardize`: if `TRUE`, the output maps are rescaled between 0 and 1. ```{r RRphylogeo,eval=FALSE} yourdir<-"YOUR_DIRECTORY" setwd(yourdir) RRphylogeography(spec1="Ursus_arctos", spec2="Ursus_maritimus", pred=pred, occs=occs, aggr=2, time_col="TIME_factor", kde_inversion=FALSE, resistance_map=NULL, clust=0.5, plot=FALSE, mask_for_pred=NULL, standardize=TRUE, output.dir=yourdir) ``` **Important**: The computational time of `RRphylogeography` is influenced by several factors, including the number of suitable cells for each target species, the `th` value used to define the most suitable cells, and the number of habitat suitability layers. To speed up the procedure, we strongly recommend aggregating the habitat suitability (HS) maps provided as `pred` by setting the `aggr` argument. Since the primary goal of `RRphylogeography` is to identify the area of origin (rather than a single cell of origin) between two target species, using very high spatial resolution maps may not be necessary. The spatial aggregation can drastically reduce computational time without compromising the analysis. Alternatively, users can increase the `th` values or restrict the study area by using the `mask_for_pred` argument. ## 4. `RRphylogeography` outputs {#output} `RRphylogeography` creates within `output.dir` an output folder named by combining the names of the two target species. Therein, `RRphylogeography` stores three `SpatRaster` maps. 1. A SpatRaster object with five layers for `spec1` 2. A SpatRaster object with five layers for `spec2` Both these files include: a. Habitat suitability (HS) map based on threshold values. b. Kernel density estimated from the species occurrence record. c. Distance calculations ("proximity") between the first and the second target species. d. Relative probability of occurrence (RPO) map. e. Relative probability of occurrence (RPO) map excluding the kernel density. 3. A raster object with two layers: a. RPO map which integrates the HS values, kernel density and the distance calculations ("proximity") for both target species ("RPO_combined"). b. RPO map without considering the kernel density ("RPO_combined_noK"). The names of the three output files are a combination of species names and the names of the HS maps. Here, we plot the RPO_combined map of *U. arctos* and *U. maritimus* at 35 kya. The RPO_combined map returned by `RRphylogeography` was enhanced to replicate the one as in Mondanaro et al. (2025). ```{r RPO,out.width='95%',dpi=300,echo=FALSE} knitr::include_graphics("RPO_map.png") ``` ## References * Mondanaro, A., Castiglione, S., Di Febbraro, M., Timmermann, A., Girardi, G., Melchionna, M., Serio, C., Belfiore, A.M., & Raia, P. (2025). RRphylogeography: A new method to find the area of origin of species and the history of past contacts between species. Methods in Ecology and Evolution, 16(3), 546-557.