---
title: "Introduction to crandep"
date: "2023-08-17"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to crandep}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette provides an introduction to the functions facilitating the analysis of the dependencies of CRAN packages, specifically `get_dep()` and `df_to_graph()`.

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message = FALSE}
library(crandep)
library(dplyr)
library(igraph)
```


## One or multiple types of dependencies
To obtain the information about various kinds of dependencies of a package, we can use the function `get_dep()` which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: `Depends`, `Imports`, `LinkingTo`, `Suggests`, `Enhances`, or any variations in their letter cases, or if `LinkingTo` is written as `Linking_To` or `Linking To`.

```{r}
get_dep("dplyr", "Imports")
get_dep("MASS", c("depends", "suggests"))
```

For more information on different types of dependencies, see [the official guidelines](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-Dependencies) and [https://r-pkgs.org/description.html](https://r-pkgs.org/description.html).

In the output, the column `type` is the type of the dependency converted to lower case. Also, `LinkingTo` is now converted to `linking to` for consistency.

```{r}
get_dep("xts", "LinkingTo")
get_dep("xts", "linking to")
```

For the reverse dependencies, instead of including the prefix "Reverse " in `type`, we use the argument `reverse`:

```{r}
get_dep("abc", c("depends", "depends"), reverse = TRUE)
get_dep("xts", c("linking to", "linking to"), reverse = TRUE)
```
Theoretically, for each forward dependency
```{r, echo=FALSE}
data.frame(from = "A", to = "B", type = "c", reverse = FALSE)
```
there should be an equivalent reverse dependency
```{r, echo=FALSE}
data.frame(from = "B", to = "A", type = "c", reverse = TRUE)
```
Aligning the `type` in the forward and reverse dependencies enables this to be checked easily.

To obtain all types of dependencies, we can use `"all"` in the second argument, instead of typing a character vector of all 5 words:

```{r}
df0.rstan <- get_dep("rstan", "all")
dplyr::count(df0.rstan, type)
df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display
dplyr::count(df1.rstan, type) # hence the summary using count()
```


## Building and visualising a dependency network
To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the [core packages of the tidyverse](https://www.tidyverse.org/packages/), and find out what each package `Imports`. We put all the dependencies into one data frame, in which the package in the `from` column imports the package in the `to` column. This is essentially the edge list of the dependency network.

```{r}
df0.imports <- rbind(
  get_dep("ggplot2", "Imports"),
  get_dep("dplyr", "Imports"),
  get_dep("tidyr", "Imports"),
  get_dep("readr", "Imports"),
  get_dep("purrr", "Imports"),
  get_dep("tibble", "Imports"),
  get_dep("stringr", "Imports"),
  get_dep("forcats", "Imports")
)
head(df0.imports)
tail(df0.imports)
```

With the help of the 'igraph' package, we can use this data frame to build a graph object that represents the dependency network.

```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"}
g0.imports <- igraph::graph_from_data_frame(df0.imports)
set.seed(1457L)
old.par <- par(mar = rep(0.0, 4))
plot(g0.imports, vertex.label.cex = 1.5)
par(old.par)
```

The nature of a dependency network makes it a directed acyclic graph (DAG). We can use the 'igraph' function `is_dag()` to check.

```{r}
igraph::is_dag(g0.imports)
```

Note that this applies to `Imports` (and `Depends`) only due to their nature. This acyclic nature does not apply to a network of, for example, `Suggests`.


## Boundary and giant component
It is possible to set a boundary on the nodes to which the edges are directed, using the function `df_to_graph()`. The second argument takes in a data frame that contains the list of such nodes in the column `name`.
```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"}
df0.nodes <-
  data.frame(
    name = c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats"),
    stringsAsFactors = FALSE
  )
g0.core <- df_to_graph(df0.imports, df0.nodes)
set.seed(259L)
old.par <- par(mar = rep(0.0, 4))
plot(g0.core, vertex.label.cex = 1.5)
par(old.par)
```


## Going forward
In [this other vignette](cran.html), we show how to obtain the dependency network of **all** CRAN packages using other functions in the package. The number of reverse dependencies can then be [modelled](degree.html).