---
title: "1. Getting started with faunabr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{1. Getting started with faunabr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = F,
  warning = FALSE
)
```

## Introduction
### What is Fauna do Brasil?
The [Catálogo Taxonômico da Fauna do Brasil](http://fauna.jbrj.gov.br/fauna) is the result of the collaborative efforts of over 500 zoologists, each specializing in various animal groups native to Brazil. This comprehensive database offers detailed and standardized morphological descriptions, nomenclatural information, geographic distribution, and identification keys for both native and non-native animals found in Brazil.

The faunabr package provides a suite of functions designed to retrieve, filter, and spatialize data from the Fauna do Brasil dataset.

## Overview of the functions:

+ `get_faunabr()`: download the latest or an older version of Fauna do Brasil database.
+ `fauna_version()`: check if you have the latest version of Fauna do Brasil data available in a directory.
+ `load_faunabr()`: load Fauna do Brasil database.
+ `translate_faunabr()`: translates information in the "lifeForm", "origin", "habitat", "taxonRank" and "taxonomicStatus" columns between Portuguese and English.
+ `fauna_attributes()`: display all the options available to filter species by its characteristics.
+ `select_fauna()`: filter species based on its characteristics and distribution.
+ `extract_binomial()`: extract the binomial name (Genus + specific epithet) from a Scientific Name.
+ `fauna_synonym()`: Retrieve synonyms for species.
+ `check_fauna_names()`: check if the species names are correct.
+ `subset_fauna()`: subset a list of species from Fauna do Brasil.
+ `fauna_spat_occ()`: get Spatial polygons (SpatVectors) of species based on its distribution.
+ `filter_faunabr()`: removes or flags records outside of the species' natural ranges.
+ `fauna_pam()`: get a presence-absence matrix of species based on its distribution.

## Installation

### Install development version from GitHub

You can install the released version of *faunabr* from [github](https://github.com/wevertonbio/faunabr) with:
```{r, results='hide', message=FALSE}
if(!require(remotes)){
    install.packages("remotes")
}

if(!require(faunabr)){
remotes::install_github('wevertonbio/faunabr')}

library(faunabr)
```

## Get data from Fauna do Brasil
All data included in the Fauna do Brasil (i.e., nomenclature, life form, and distribution) are stored in Darwin Core Archive data sets, which is updated often. Before downloading the data available in the Fauna do Brasil, we need to create a folder to save the data:

```{r}
#Creating a folder in a temporary directory
#Replace 'file.path(tempdir(), "faunabr")' by a path folder to be create in 
#your computer
my_dir <- file.path(file.path(tempdir(), "faunabr"))
dir.create(my_dir)
```

You can now utilize the `get_faunabr` function to retrieve the most recent version of the data:

```{r, results='hide', message=FALSE, warning=FALSE}
get_faunabr(output_dir = my_dir, #directory to save the data
            data_version = "latest", #get the most recent version available
            translate = TRUE, #translate data to English
            overwrite = T) #Overwrite data, if it exists
```

The function will take a few seconds to download the data and a few minutes to merge the datasets into a single data.frame. Upon successful completion, you will find a folder named with the version of Fauna do Brasil. This folder contains the downloaded raw dataset (TXT files in Portuguese) and a file named **CompleteBrazilianFauna.gz**. The latter represents the finalized dataset. By default, the data is translated into English, but you can keep the original Portuguese version by setting `translate = FALSE`.

You also have the option to download an older, specific version of the Fauna do Brasil dataset. To explore the available versions, please refer to [this link](https://ipt.jbrj.gov.br/jbrj/resource?r=catalogo_taxonomico_da_fauna_do_brasil). For downloading a particular version, simply replace 'latest' with the desired version number. For example:

```{r, results='hide', message=FALSE, warning=FALSE}
get_faunabr(output_dir = my_dir, #directory to save the data
            data_version = "1.10", #Version 1.10, published on 2024-02-01
            overwrite = T) #Overwrite data, if it exists
```

As previously mentioned, you will find a folder named '1.10' within the designated directory.

To view the available versions in your specified directory, run:

```{r}
fauna_version(data_dir = my_dir)
#> You have the following versions of Fauna do Brasil:
#>  1.17
#>  1.10
#>  It includes the latest version:  1.17
```

## Loading the data
In order to use the other functions of the package, you need to load the data into your environment. To achieve this, utilize the `load_faunabr()` function. By default, the function will automatically search for the latest available version in your directory. However, you have the option to specify a particular version using the *data_version* parameter.
Additionally, you can choose between two versions of the data: the 'short' version (containing the 19 columns required for run the other functions of the package) or the 'complete' version (with all original 31 columns). The function imports the 'short' version by default.

```{r}
#Short version
bf <- load_faunabr(data_dir = my_dir,
                   data_version = "latest",
                   type = "short") #short
#> Loading version 393.401
colnames(bf) #See variables from short version
#>  [1] "species"             "subspecies"          "scientificName"       
#>  [4] "validName"           "kingdom"             "phylum"
#>  [7] "class"               "order"               "family" 
#> [10] "genus"               "lifeForm"            "habitat"
#> [13] "states"              "countryCode"         "origin" 
#> [16] "taxonomicStatus"     "nomenclaturalStatus" "vernacularName"          
#> [19] "taxonRank"           "id"                  "language"
```

Note that the complete version has 12 more columns:

```{r}
#Complete version
bf_complete <- load_faunabr(data_dir = my_dir,
                   data_version = "latest",
                   type = "complete") #complete

colnames(bf_complete) #See variables from complete version
#>  [1] "id"                       "taxonID"                 
#>  [3] "species"                  "subspecies"       
#>  [5] "scientificName"           "validName"                   
#>  [7] "validNameUsage"           "parentNameUsage"                             
#>  [9] "namePublishedInYear"      "higherClassification"      
#> [11] "kingdom"                  "phylum"                                   
#> [13] "class"                    "order"                                   
#> [15] "family"                   "genus"               
#> [17] "specificEpithet"          "infraspecificEpithet"
#> [19] "taxonRank"                "scientificNameAuthorship"                  
#> [21] "taxonomicStatus"          "nomenclaturalStatus" 
#> [23] "vernacularName"           "lifeForm"      
#> [25] "habitat"                  "origin"
#> [27] "states"                   "countryCode"    
#> [29] "modified"                 "bibliographicCitation"                              
#> [31] "relationshipOfResource"   "language"                             
```

### Translate information
By default, the data downloaded from [Catálogo Taxonômico da Fauna do Brasil](http://fauna.jbrj.gov.br/fauna) is translated into English, unless you set `translate = FALSE` in `get_faunabr()` function. However, you can use the `translate_faunabr()` function to translate the loaded data between English and Portuguese:
```{r}
#Translate from English to Portuguese
bf_portuguese <- translate_faunabr(data = bf, to = "pt_br")
# See lifeforms (in portuguese)
fauna_attributes(bf_portuguese, "lifeForm")
#>  $lifeForm
#>                 lifeForm
#> 1               colonial
#> 2               comensal
#> 3           ectoparasito
#> 4        ectoparasitoide
#> 5           endoparasito
#> 6        endoparasitoide
#> 7              epibionte
#> 8              eussocial
#> 9              herbivoro
#> 10      hiperparasitoide
#> 11             inquilino
#> 12                mutual
#> 13           polinizador
#> 14              predador
#> 15                sessil
#> 16 vida_livre_individual

#Translate from Portuguese to English
bf_english <- translate_faunabr(data = fauna_portuguese, to = "en")
#See life forms(in English)
fauna_attributes(bf_english, "lifeForm")
#>  $lifeForm
#>                  lifeForm
#> 1                colonial
#> 2               commensal
#> 3            ectoparasite
#> 4          ectoparasitoid
#> 5            endoparasite
#> 6          endoparasitoid
#> 7                epibiont
#> 8                eusocial
#> 9  free_living_individual
#> 10              herbivore
#> 11        hyperparasitoid
#> 12              inquiline
#> 13            mutualistic
#> 14              polynizer
#> 15               predator
#> 16                sessile

```

## Save data

If you want to save the datasets to open with an external editor (e.g. Excel), you can save the data.frame as a CSV file:
```{r, eval=FALSE, warning=FALSE, message=FALSE, }
write.csv(x = bf,
          file = file.path(my_dir, "BrazilianFauna.csv"),
          row.names = F)
```