--- title: "Introduction to crandep" date: "2023-08-17" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to crandep} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This vignette provides an introduction to the functions facilitating the analysis of the dependencies of CRAN packages, specifically `get_dep()`, `df_to_graph()` and `topo_sort_kahn()`. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message = FALSE} library(crandep) library(dplyr) library(igraph) ``` ## One or multiple types of dependencies To obtain the information about various kinds of dependencies of a package, we can use the function `get_dep()` which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: `Depends`, `Imports`, `LinkingTo`, `Suggests`, `Enhances`, or any variations in their letter cases, or if `LinkingTo` is written as `Linking_To` or `Linking To`. ```{r} get_dep("dplyr", "Imports") get_dep("MASS", c("depends", "suggests")) ``` For more information on different types of dependencies, see [the official guidelines](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-Dependencies) and [https://r-pkgs.org/description.html](https://r-pkgs.org/description.html). In the output, the column `type` is the type of the dependency converted to lower case. Also, `LinkingTo` is now converted to `linking to` for consistency. ```{r} get_dep("xts", "LinkingTo") get_dep("xts", "linking to") ``` For the reverse dependencies, instead of including the prefix "Reverse " in `type`, we use the argument `reverse`: ```{r} get_dep("abc", c("depends", "depends"), reverse = TRUE) get_dep("xts", c("linking to", "linking to"), reverse = TRUE) ``` Theoretically, for each forward dependency ```{r, echo=FALSE} data.frame(from = "A", to = "B", type = "c", reverse = FALSE) ``` there should be an equivalent reverse dependency ```{r, echo=FALSE} data.frame(from = "B", to = "A", type = "c", reverse = TRUE) ``` Aligning the `type` in the forward and reverse dependencies enables this to be checked easily. To obtain all types of dependencies, we can use `"all"` in the second argument, instead of typing a character vector of all 5 words: ```{r} df0.rstan <- get_dep("rstan", "all") dplyr::count(df0.rstan, type) df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display dplyr::count(df1.rstan, type) # hence the summary using count() ``` ## Building and visualising a dependency network To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the [core packages of the tidyverse](https://www.tidyverse.org/packages/), and find out what each package `Imports`. We put all the dependencies into one data frame, in which the package in the `from` column imports the package in the `to` column. This is essentially the edge list of the dependency network. ```{r} df0.imports <- rbind( get_dep("ggplot2", "Imports"), get_dep("dplyr", "Imports"), get_dep("tidyr", "Imports"), get_dep("readr", "Imports"), get_dep("purrr", "Imports"), get_dep("tibble", "Imports"), get_dep("stringr", "Imports"), get_dep("forcats", "Imports") ) head(df0.imports) tail(df0.imports) ``` With the help of the 'igraph' package, we can use this data frame to build a graph object that represents the dependency network. ```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"} g0.imports <- igraph::graph_from_data_frame(df0.imports) set.seed(1457L) old.par <- par(mar = rep(0.0, 4)) plot(g0.imports, vertex.label.cex = 1.5) par(old.par) ``` The nature of a dependency network makes it a directed acyclic graph (DAG). We can use the 'igraph' function `is_dag()` to check. ```{r} igraph::is_dag(g0.imports) ``` Note that this applies to `Imports` (and `Depends`) only due to their nature. This acyclic nature does not apply to a network of, for example, `Suggests`. ## Boundary and giant component It is possible to set a boundary on the nodes to which the edges are directed, using the function `df_to_graph()`. The second argument takes in a data frame that contains the list of such nodes in the column `name`. ```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"} df0.nodes <- data.frame( name = c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats"), stringsAsFactors = FALSE ) g0.core <- df_to_graph(df0.imports, df0.nodes) set.seed(259L) old.par <- par(mar = rep(0.0, 4)) plot(g0.core, vertex.label.cex = 1.5) par(old.par) ``` ## Topological ordering of nodes Since networks according to `Imports` or `Depends` are DAGs, we can obtain the [topological ordering](https://en.wikipedia.org/wiki/Topological_sorting) using, for example, [Kahn's (1962) sorting algorithm](https://doi.org/10.1145/368996.369025). ```{r} topo_sort_kahn(g0.core) ``` In the topological ordering, represented by the column `id_num`, a low (high) number represents being at the front (back) of the ordering. If package A `Imports` package B i.e. there is a directed edge from A to B, then A will be topologically before B. As the package 'tibble' doesn't import any package but is imported by most other packages, it naturally goes to the back of the ordering. This ordering may not be unique for a DAG, and other admissible orderings can be obtained by setting `random=TRUE` in the function: ```{r} set.seed(387L); topo_sort_kahn(g0.core, random = TRUE) ``` We can also apply the topological sorting to the bigger dependencies network. ```{r} df0.topo <- topo_sort_kahn(g0.imports) head(df0.topo) tail(df0.topo) ``` ## Going forward In [this other vignette](cran.html), we show how to obtain the dependency network of **all** CRAN packages using other functions in the package. The number of reverse dependencies can then be [modelled](degree.html).