---
title: "Using {nichetools} with SIBER"
author: Benjamin L. Hlina
date: "2024-08-29"
output: rmarkdown::html_vignette

vignette: >
  %\VignetteIndexEntry{Using {nichetools} with SIBER}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
  
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

### Our Objectives

The purpose of this vignette is to use [{SIBER}](https://cran.r-project.org/package=SIBER) and [{nichetools}](https://benjaminhlina.github.io/nichetools/) to extract and then visualize estimates of trophic niche size and similarities and Layman community metrics for multiple freshwater fish using [{ggplot2}](https://ggplot2.tidyverse.org/). 

This vignette can be used for additional purposes including estimating niche size and similarities among different groups of aquatic and/or terrestrial species. Furthermore, niche size and similarities for different behaviours exhibited within a population can be made using behavioural data generated from acoustic telemetry (e.g., differences in habitat occupancy).

### Bring in trophic data
First we will load the necessary packages to preform the analysis and visualization. We will use [{SIBER}](https://cran.r-project.org/package=SIBER) and [{nichetools}](https://benjaminhlina.github.io/nichetools/) to preform the analysis. We will use [{bayestestR}](https://easystats.github.io/bayestestR/) to calculate and extract  medians and Equal-Tailed Interval (ETI) of posterior distributions that we want to plot. We will use [{dplyr}](https://dplyr.tidyverse.org/), [{tidyr}](https://tidyr.tidyverse.org/), and [{purrr}](https://purrr.tidyverse.org/) to manipulate data and iterate processes. Lastly, we will use [{ggplot2}](https://ggplot2.tidyverse.org/), [{ggtext}](https://wilkelab.org/ggtext/), and [{ggdist}](https://mjskay.github.io/ggdist/) to plot and add labels.

I will add that many of the `{dplyr}` and `{tidyr}` functions and processes can be replaced using [{data.table}](https://CRAN.R-project.org/package=data.table) which is great when working with large data sets.

We will first load the packages that were just mentioned 
```{r, message=FALSE}
{
  library(bayestestR)
  library(dplyr)
  library(ggplot2)
  library(ggdist)
  library(ggtext)
  library(nichetools)
  library(purrr)
  library(SIBER)
  library(tidyr)
  library(viridis)
}
```
For the purpose of the vignette we will be using the `demo.siber.data.2` data frame that is available within `{SIBER}`. We will first look at the structure of this data frame using the `{dplyr}` function `glimpse()`. For your purposes you will need to replace this with your data frame either by loading a csv, rds, or qs file. You can do this multiple ways, I prefer using `readr::read_csv()` for a csv but base R's `read.csv()` works perfectly fine. Note, `{SIBER}` functions do not take `tibbles` or `data.tables` so you will have to convert either to `data.frame` class prior to running functions in `{SIBER}`. 

```{r}
glimpse(demo.siber.data.2)
```

You will notice that the community and group column are character strings that are the actual names of the communities and groups. I advise changing them into factors, thus allow you to know the order for each column prior to converting them into a `numeric` and then a `charcter`. The reason why this is important, is that functions in `{SIBER}` use `for` loops based on the indexing of the `SiberObject`. If this order does not match up you can have issues with the names of the communities and groups you are working with.

Let us change these into a `factor`, then preserve the column and create an id column that is the numerical order that will become the community and group names provided to `createSiberObject()`. 

```{r}
demo.siber.data.2 <- demo.siber.data.2 %>%
  mutate(
    group = factor(group), 
    community = factor(community), 
    group_id = as.numeric(group) %>% 
      as.character(),
    community_id = as.numeric(community) %>%
      as.character()
  ) %>% 
  rename(
    group_name = group,
    community_name = community,
    group = group_id,
    community = community_id
  )

glimpse(demo.siber.data.2)
```

After we have done this, we are going to create two data frames that are the names of our communities and groups with their associated id values. We will use these data frames later on to join up the actual names, allowing us to know what estimates belong to which communities and groups. We are doing this because it is unlikely you will have communities and groups named 1, 2, 3 ect. and instead will have actual names. 

```{r}
# ---- create name with group and community data frame ----
cg_names <- demo.siber.data.2 %>%
  distinct(group,
           community, 
           group_name, 
           community_name) %>%
  
  arrange(community, group)

# ---- create community names data frame ---- 
c_names <- demo.siber.data.2 %>% 
  distinct(community, 
           community_name) %>%
  arrange(community)
```
We will then plot our biplot to confirm we have the correct structure. 

```{r}
#| out-width: 100%
ggplot(data = demo.siber.data.2, aes(x = iso1, y = iso2,
                                     colour = group_name)) +
  geom_point() +
  facet_wrap(~ community_name) + 
  scale_colour_viridis_d(option = "A", begin = 0.25, end = 0.85,
                         name = "Groups", alpha = 0.75) + 
  theme_bw(
    base_size = 15
  ) + 
  theme(
    strip.background = element_blank(),
    panel.grid = element_blank(), 
    axis.title = element_markdown(),
    legend.position = "inside",
    legend.position.inside = c(0.65, 0.75)
  ) + 
  labs(
    x = paste0("\U03B4","<sup>", 13, "</sup>", "C", " (‰)"),
    y = paste0("\U03B4","<sup>", 15, "</sup>", "N", " (‰)")
  )
```

Next we will grab the isotopes we need and the community and group ids that have already been renamed  to `community` and `group`. This is important as `creatSiberObject()` will 1) only take the following order with the following names `iso1`, `iso2`, `group`, and `community` and 2) we will transform this `tibble` into a `data.frame` as `{SIBER}` will only work with a `data.frame`. In this case we were using `tibbles` but we could also be working with `data.table` and will need to do the same thing. 

```{r}
demo_siber_data <- demo.siber.data.2 %>% 
  dplyr::select(iso1, iso2, group, community) %>%
  arrange(community, group) %>% 
  as.data.frame() 
```

### Convert to {SIBER} object

First convert to our isotope data into a `{SIBER}` object.

```{r}
siber_example <- createSiberObject(demo_siber_data)
```

Now that this is a `{SIBER}` object we can start doing some analysis using a 
frequentist (e.g., maximum-likelihood) or a Bayesian framework. The data and metrics generated through this analysis by `{SIBER}` can be extracted using functions in `{nichetools}`. 

### Bayesian Ellipse Analysis 

We first need to set the parameters to run the model 

```{r}
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10     # thin the posterior by this many
parms$n.chains <- 2        # run this many chains
```

Next we need to define the priors for each parameter of the model. This includes fitting the ellipses using an Inverse Wishart prior on the covariance matrix ($\Sigma$), and a vague normal prior on the means ($\mu$).

```{r}
# fit the ellipses which uses an Inverse Wishart prior on the 
# covariance matrix Sigma, and a vague normal prior on the 
# means.
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3
```

We will now run the model using the function `siberMVN()`. 

```{r, message=FALSE, results='hide'}
ellipses_posterior <- siberMVN(siber_example, parms, priors)
```

### Extract posterior distributions for μ and Σ

We will first extract posterior distribution for $\mu$ using `extract_mu()` in `{nichetools}`. We will need to set the argument `pkg` to `"SIBER"` and the argument `data_format` to `"wide"`. This argument takes `"long"` or `"wide"` which dictates whether the data object returned is in wide or long format. We will also use the function `seperate_wider_delim()` from `{tidyr}` to separate the community and groups names as they are joined by a `.`. We then will use `left_join()` and the `cg_names` data frame we created above to add in the correct community and group names. 

```{r, message = FALSE}
df_mu <- extract_mu(ellipses_posterior, pkg = "SIBER", 
                    data_format = "wide", 
                    community_df = cg_names)
```

We can confirm that the posterior estimates of $\mu$ are correct by plotting them with {ggplot2}.

```{r}
#| out-width: 100%
ggplot() +
  geom_point(data = df_mu, aes(x = d13c, y = d15n,
                               colour = group_name)) +
  geom_point(data = demo.siber.data.2, aes(x = iso1, y = iso2,
                                           colour = group_name)) +
  facet_wrap( ~ community_name) + 
  scale_colour_viridis_d(option = "A", begin = 0.25, end = 0.85,
                         name = "Groups", alpha = 0.75) + 
  theme_bw(
    base_size = 15
  ) + 
  theme(
    strip.background = element_blank(),
    panel.grid = element_blank(), 
    axis.title = element_markdown(),
    legend.position = "inside",
    legend.position.inside = c(0.65, 0.75)
  ) + 
  labs(
    x = paste0("\U03B4","<sup>", 13, "</sup>", "C", " (‰)"),
    y = paste0("\U03B4","<sup>", 15, "</sup>", "N", " (‰)")
  )
```

Notice the density of points in the center of the raw data for the corresponding colours. This density of points are the posterior estimates from the model for $\mu$ and are an indication that the model is iterating over the groups and communities correctly.  

We are also going to extract $\mu$ in long format for creating ellipse, as the function that create ellipse needs the data frame of $\mu$ to be in long format. 

```{r, message=FALSE}
df_mu_long <- extract_mu(ellipses_posterior, pkg = "SIBER", 
                         community_df = cg_names)
```

Next we are going to extract posterior estimates of $\Sigma$ using `extract_sigma()`. 

```{r}
df_sigma <- extract_sigma(ellipses_posterior, pkg = "SIBER")
```

Lastly, we are going to feed each $\mu$ and $\Sigma$ estimate to `niche_ellipse()` to estimate each ellipse. There are few things to know about this function and they include the following: 1) it will randomly sample 10 ellipse from the total posterior distribution of $\mu$ and $\Sigma$, this seems quite standard, however, you can adjust the number of samples by changing the argument `n`. 2) to make the function consistently randomly sample the same set you will need to set `set_seed` to a numerical value. If this is not set then it will randomly sample a different set of 10 ellipses every time. 3) if you would like the function to not randomly sample set the argument `random` to `FALSE`. 4) by default it will tell you how long it takes to generate the ellipse and will have progress bars at each step. If you want to turn this off set `message` to `FALSE`. 5) if you are wanting to change the confidence level of the ellipse you can do so using the argument `p_ell`. This value is bound between 0 and 1. 

```{r, message = FALSE}
df_el <- niche_ellipse(dat_mu = df_mu_long,
                       dat_sigma = df_sigma, 
                       set_seed = 4, n = 20) %>%
  separate_wider_delim(sample_name, cols_remove = FALSE,
                       delim = ".", names = c("community",
                                              "group")) %>%
  left_join(cg_names)
```

Now that we have the ellipses created we can plot them.

```{r}
#| out-width: 100%
ggplot(data = df_el, 
       aes(x = d13c, y = d15n, 
           group = interaction(sample_number, 
                               sample_name), 
           colour = group_name)) + 
  geom_polygon(linewidth = 0.5, fill = NA) + 
  facet_wrap( ~ community_name) + 
  scale_colour_viridis_d(option = "A", begin = 0.25, end = 0.85,
                         name = "Groups", alpha = 0.75) + 
  theme_bw(
    base_size = 15
  ) + 
  theme(
    strip.background = element_blank(),
    panel.grid = element_blank(), 
    legend.position = "inside",
    axis.title = element_markdown(),
    legend.background = element_blank(), 
    legend.position.inside = c(0.65, 0.75)
  ) + 
  labs(
    x = paste0("\U03B4","<sup>", 13, "</sup>", "C", " (‰)"),
    y = paste0("\U03B4","<sup>", 15, "</sup>", "N", " (‰)")
  )
```

### Extract Niche Size 

We can use `siberEllipses()` from {SIBER} to estimate niche size for each posterior sample for each community and group.  

```{r}
sea_b <- siberEllipses(corrected.posteriors = ellipses_posterior)
```

Next using `extract_niche_size()` from `{nichetools}` we can extract niche size which will also add in the correct names for the communities and groups we are working with. 

```{r}
seb_convert <- extract_niche_size(data = sea_b,
                                  pkg = "SIBER",
                                  community_df = cg_names)
```

We will also extract the parametric estimate for niche size using `groupMetricsML()` from `{SIBER}`. 

```{r}
group_ml <- groupMetricsML(siber_example)
```

We can convert the output of this function using `extract_group_metrics()`. 

```{r}
group_convert <- extract_group_metrics(data = group_ml,
                                       community_df = cg_names)
```

The object returned will have maximum-likelihood estimates for the standard ellipse area (SEA), the central standard ellipse area (SEAc), and the total area (TA). For plotting niche size we will use `SEAc`. 

```{r}
sea_c <- group_convert %>% 
  filter(metric == "SEAc")
```

Lastly, we can visualize the extracted niche sizes using {ggplot2}. 

```{r}
#| out-width: 100%
ggplot() +
  geom_violin(data = seb_convert, aes(x = community_name,
                                      y = sea,
                                      fill = group_name)) + 
  scale_fill_viridis_d(option = "A", begin = 0.25, end = 0.85,
                       name = "Groups", alpha = 0.75) + 
  geom_point(data = sea_c, aes(x = community_name, 
                               y = est, 
                               group = group_name,
  ), 
  size = 2.5,
  fill = "white",
  shape = 21,
  position = position_dodge(width = 0.9)) +
  theme_bw(
    base_size = 15
  ) + 
  theme(
    strip.background = element_blank(),
    panel.grid = element_blank(), 
    legend.position = "inside",
    legend.background = element_blank(), 
    legend.position.inside = c(0.85, 0.8)
  ) + 
  labs(
    x = "Communities", 
    y = expression(paste("Niche Size p(", "‰"^2, "| x)"))
  )
```


### Niche Similarties 

Now that we have extracted Bayesian estimates of niche size we are likely wanting to know how much do these niches have in common or are similar. 

We can use maximum-likelihood and Bayesian frameworks to estimate the percentage of similarity within communities between groups and/or among communities with groups being consistent. We can use functions in `{SIBER}` to create maximum-likelihood and Bayesian estimates of niche similarity followed by functions in `{nichetools}` to extract these similarities.

First we need to create our comparisons that we are wanting to evaluate. We can use the function `create_comparisons()` to generate a list that has two-column data frames of each comparison. You can change the argument `comparison` to `"among"` to compare communities for the same group, versus `"within"` which compares groups within a community. 
```{r}
cg_names_within_com <- cg_names %>%
  create_comparisons(comparison = "within")
```

Next we can feed this listed data frames to either `maxLikOverlap()` or `bayesianOverlap()` using `map()` from `{purrr}`. For this exercise I have not changed `.progress` argument of `map()` but when working with large data sets I often turn this argument to `TRUE` to provide a progress update. 

```{r}
ml_within_overlap <- cg_names_within_com %>%
  map(~ maxLikOverlap(.x$cg_1, .x$cg_2, siber_example,
                      p.interval = 0.95, n = 100))
```

Next we can extract estimates using `extract_similarities()` with the `type` argument set to `"ml"`. 

```{r}
ml_95_within_com <- extract_similarities(ml_within_overlap, 
                                         type = "ml", 
                                         community_df = cg_names)
```

Now we will repeat the process for `bayesianOverlap()`, first supplying the function with our list of data frames, next having `map()` iterate over this list. 
```{r}
bayes95_overlap <- cg_names_within_com %>%
  map(~ bayesianOverlap(.x$cg_1, .x$cg_2, ellipses_posterior,
                        draws = 100, p.interval = 0.95,
                        n = 100)
  )
```

Next we can extract estimates using `extract_similarities()` with the `type` argument set to `"bay"`. 

```{r}
bays_95_overlap <- extract_similarities(bayes95_overlap, 
                                        type = "bay",
                                        community_df = cg_names)
```


Now that we have extracted  maximum-likelihood and Bayesian estimates we can visualize them. 

Prior to creating a point interval plot, we need to create a colour palette that will be used to identify each community.  

```{r}
viridis_colors_s <- viridis(4, begin = 0.25, end = 0.85,
                          option = "A",
                          alpha = 0.75
)
```


Now we can use point interval plots to visually represent posterior distributions.  
```{r}
#| out-width: 100%

ggplot() + 
  stat_pointinterval(data = bays_95_overlap,
              aes(x = group_1, 
                  y = prop_overlap, 
                  point_fill = group_2), 
    interval_colour = "grey60",
    point_size = 3,
    shape = 21,
    position = position_dodge(0.4)) + 
  
  geom_point(data = ml_95_within_com, aes(x = group_1, 
                                   y = prop_overlap,
                                   group = group_2), 
             shape = 21,
             fill = "white",
             size = 2,
             alpha = 0.5, 
             position = position_dodge(0.4)) + 
  scale_fill_manual(name = "Group", 
                    aesthetics = "point_fill",
                    values = viridis_colors_s) + 
  theme_bw(
    base_size = 15
  ) + 
  theme(
    panel.grid = element_blank(),
    strip.background = element_blank(),
    legend.position = "inside",
    legend.position.inside = c(0.15, 0.80),
  ) +
  labs(
    x = "Group",
    y = expression(paste("p(", "‰", "|X)"))
  )
```


### Bayesian Estimates of Layman's Community Metrics

First I highly recommend reading [Layman et al. 2007](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/0012-9658%282007%2988%5B42%3ACSIRPF%5D2.0.CO%3B2) to understand the six community metrics that will be estimating using a Bayesian framework in this section. 

To create maximum-likelihood estimate of Layman's community metrics we first need to use `communityMetricsML()` from `{SIBER}`. 

```{r}
community_ml <- communityMetricsML(siber_example)
```


Then we can extract these estimates using `extract_layman()` with the `type` argument set to 
`"ml"`. The default for this argument is `"bay"`. 

```{r}
layman_ml <- extract_layman(community_ml, 
                            type = "ml", 
                            community_df = c_names)
```

To create Bayesian estimates of Layman's community metrics we first need to use `extractPosteriorMeans()` from 
`{SIBER}` to extract posterior estimates of means for each community and group. 

```{r}
mu_post <- extractPosteriorMeans(siber_example, ellipses_posterior)
```

Next we need to use `bayesianLayman()` from `{SIBER}` and use our extracted posterior means to create Bayesian estimates for each community metric.  

```{r}
layman_b <- bayesianLayman(mu.post = mu_post)
```

Once we have created our Bayesian estimates for each community metric we can use `extract_layman()` to extract these estimates. The function will also create the variable `labels` which will first assign a new name to each community metric and secondly will reorder these names based on how these metrics are often viewed. 

```{r}
layman_be <- extract_layman(layman_b, community_df = c_names)
```

Prior to creating a point interval plot, we need to create a colour palette that will be used to identify each community.  

```{r}
viridis_colors <- viridis(2, begin = 0.25, end = 0.85,
                          option = "G",
                          alpha = 0.75
)
```

Lastly, we can visualize the distributions of the posterior estimates using `stat_pointinterval()` from {ggdist}. 

```{r}
#| out-width: 100%

ggplot() + 
  stat_pointinterval(
    data = layman_be, aes(x = labels, 
                          y = post_est, 
                          point_fill = community_name),
    point_size = 2.5,
    interval_colour = "grey60",
    position = position_dodge(0.4),
    shape = 21
  ) + 
  geom_point(data = layman_ml, aes(x = labels, 
                                   y = estimate,
                                   group = community_name), 
             shape = 21,
             fill = "white",
             alpha = 0.5, 
             position = position_dodge(0.4)) + 
  scale_fill_manual(name = "Community", 
                    aesthetics = "point_fill",
                    values = viridis_colors) + 
  theme_bw(
    base_size = 15
  ) + 
  theme(
    panel.grid = element_blank(),
    axis.text = element_markdown(),
    legend.position = "inside",
    legend.position.inside = c(0.88, 0.85),
  ) +
  labs(
    x = "Community Metrics",
    y = expression(paste("p(", "‰", "|X)"))
  )
```


Congratulations, we have successfully used functions from `{SIBER}` to extract and visually represent trophic communities and niche size and similarities. If you something doesn't work/is confusing please reach out.