---
title: "Get started with authoritative"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get started with authoritative}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r}
#| include: false
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r}
#| label: setup
library(authoritative)
```

This package has two main categories of functionality:

- [Extracting R package author information](#extract-authors): 
  Parse the `DESCRIPTION` file of R packages, or the equivalent data provided as a `data.frame` by the `{pkgsearch}` package,
  to extract authors from the `Author` or `Authors@R` fields.
  These features are only useful in the context of the R package ecosystem.
- [Cleaning and deduplicating names](#clean-authors): 
  Clean and deduplicate a list of names.
  It can be applied to the list of R package author names extracted in the previous step,
  but it also directly applies in other, more generic, tasks where one needs to clean and deduplicate a list of names.

## Extracting R package author information {#extract-authors}

R package authors can be specified in two ways in the `DESCRIPTION` file.

### Extracting R package author information from the `Author` field

The authors can be listed in the `Author` field, as free text. 
However, this method is now actively discouraged by CRAN, and many R user communities, such as rOpenSci.

We would for example have:

```yaml
Author: Ada Lovelace and Charles Babbage
```
Because this is free-text, it could be formatted in many different ways, 
and it is hard to programmatically extract the names.
The `parse_authors()` function provided by the package is designed to split at common delimiter and clean common extra words.
  
```{r}
parse_authors("Ada Lovelace and Charles Babbage")
parse_authors("Ada Lovelace, Charles Babbage")
parse_authors("Ada Lovelace with contributions from Charles Babbage")
parse_authors("Ada Lovelace, Charles Babbage, et al.")
```

### Extracting R package author information from the `Authors@R` field

The authors can also be listed in `Authors@R` field as a string containing R code that generates a vector of `person` objects.
This is the most modern and recommended way to specify authors in the `DESCRIPTION` file.

```yaml
Authors@R: c(
  person("Ada Lovelace", role = c("aut", "cre"), email = "ada@email.com"),
  person("Charles Babbage", role = "aut")
)
```

```{r}
auts <- parse_authors_r("c(
  person('Ada Lovelace', role = c('aut', 'cre'), email = 'ada@email.com'),
  person('Charles Babbage', role = 'aut')
)")

class(auts)

str(auts)

print(auts)
```

If we only want the names, we can use the `format.person()` base R function:

```{r}
format(auts, include = c("given", "family"))
```

## Cleaning and deduplicating author names {#clean-authors}

We can now take the list of authors extracted from the previous step, or an independently gathered list of names, and clean and deduplicate it.

### Harmonize differently abbreviated names

The `expand_names()` function can be used to expand differently abbreviated names to a common form, passed in the `expanded` argument:

```{r}
expand_names(c("Ada Lovelace", "A Lovelace"), expanded = "Ada Lovelace")
```

However, a common pattern is to pass the vector to clean itself in `expanded`.
This way, you can harmonize names to their longest form in the vector, even if you do not know the full name of all authors in advance:

```{r}
my_names <- c("Ada Lovelace", "A Lovelace", "Charles Babbage")
expand_names(my_names, my_names)
```