% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- \documentclass[nojss]{jss} \usepackage{dsfont} \usepackage{bbm} \usepackage{amsfonts} \usepackage{wasysym} \usepackage{amssymb} \usepackage{wrapfig} %\title{Programmers' Niche: Efficient Multivariate polynomials in \R} %\subtitle{The \pkg{spray} package} %\author{Robin K. S. Hankin} %\maketitle \author{Robin K. S. Hankin\\University of Stirling} \title{Sparse arrays and multivariate polynomials in \proglang{R}} %\VignetteIndexEntry{A vignette for the spray package} %% for pretty printing and a nice hypersummary also set: \Plainauthor{Robin K. S. Hankin} \Plaintitle{Sparse arrays and multivariate polynomials in R} \Shorttitle{Sparse arrays and multivariate polynomials in \proglang{R}} \Abstract{In this short article I introduce the \pkg{spray} package, which provides some functionality for handling sparse arrays. The package uses the~\proglang{C++} Standard Template Library's \proglang{map} class~\citep{musser2009} to store and retrieve elements. One natural application for sparse arrays is multivariate polynomials and I give two examples of the package in use, one drawn from the fields of random walks on lattices and one from the field of recreational combinatorics. To cite the \pkg{spray} package, use \citet{hankin2022_spray}. } \Keywords{Multivariate polynomials, \proglang{R}, sparse arrays} \Plainkeywords{Multivariate polynomials, R} %% publication information %% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED. %% If it was not (yet) accepted, leave them commented. %% \Volume{13} %% \Issue{9} %% \Month{September} %% \Year{2004} %% \Submitdate{2004-09-29} %% \Acceptdate{2004-09-29} %% \Repository{https://github.com/RobinHankin/spray} %% this line for Tragula %% The address of (at least) one author should be given %% in the following format: \Address{ Robin K. S. Hankin\\%\orcid{https://orcid.org/0000-0001-5982-0415}\\ University of Stirling\\ Scotland } %% It is also possible to add a telephone and fax number %% before the e-mail in the following format: %% Telephone: +43/1/31336-5053 %% Fax: +43/1/31336-734 %% for those who use Sweave please include the following line (with % symbols): %% need no \usepackage{Sweave.sty} %% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= ignore <- require(spray) options(prompt = "R> ", continue = "+ ", width = 70, useFancyQuotes = FALSE) @ \SweaveOpts{} \begin{document} \section{Introduction} \setlength{\intextsep}{0pt} \begin{wrapfigure}{r}{0.2\textwidth} \begin{center} \includegraphics[width=1in]{\Sexpr{system.file("help/figures/spray.png",package="spray")}} \end{center} \end{wrapfigure} The \pkg{multipol} package~\citep{hankin2008} furnishes the \proglang{R} programming language with functionality for multivariate polynomials. However, the \pkg{multipol} package was noted as being inefficient in many common cases: the package stores multivariate polynomials as arrays and this often involves storing many zero elements which consume computational and memory resources unnecessarily. One suggestion was to use sparse arrays---in which nonzero elements are stored along with an index vector describing their coordinates---instead of arrays. In this short document I introduce the \pkg{spray} package which provides functionality for sparse arrays and interprets them as multivariate polynomials. Some of the underlying design philosophy is discussed in the appendix. Here, `sparse multinomial' is defined as one whose array representation is sufficiently sparse to make taking advantage of is sparseness worthwhile. However, other definitions of sparseness may be useful and I outline some below. \subsection{Existing work} The~\pkg{slam} package~\citep{hornik2014} provides some sparse array functionality but is not intended to interpret arbitrary dimensional sparse arrays as multivariate polynomials; the \pkg{rSympy} package does not, as of 2017, implement sparse multivariate polynomials. The \pkg{mpoly}~\citep{kahle2013} package handles multivariate polynomials but does not accept negative powers, nor is it designed for efficiently processing large multivariate polynomials; I present some timings below. The \pkg{mpoly} package is different in philosophy from both the~\pkg{spray} package and~\pkg{multipol} in that~\pkg{mpoly} is more ``symbolic'' in the sense that it admits---and handles appropriately---named variables, whereas my packages do not make any reference to the {\em names} of the variables. As~\citeauthor{kahle2013} points out, naming the variables allows a richer and more natural suite of functionality; straightforward \pkg{mpoly} idiom is somewhat strained in \pkg{spray}. (The \code{mvp} package~\citep{hankin2022_mvp} is now available; this uses a more powerful concept of sparsity). \section{Sparse arrays} Base R has extensive support for multidimensional arrays. Consider <>= a <- array(1, dim=2:5) @ \noindent The resulting object requires storage of~$2\times 3\times 4\times 5=120$ floating point numbers, which are represented in an elegant format amenable to Cartesian extraction and replacement. However, arrays in which many of the elements are zero are common and in this case storing only the nonzero elements and their positions would be a more compact and efficient representation. To create a sparse array object in the \pkg{spray} package, one specifies a matrix of indices \code{M} with each row corresponding to the position of a nonzero element, and a numeric vector of values: <>= library("spray") M <- matrix(c(0, 0, 0, 1, 0, 0, 1, 1, 1, 2, 0, 3), ncol=3) M S1 <- spray(M, 1:4) S1 @ Thus \code{S1[0,0,2] = 2}. The representation of the spray object does not preserve the order of the index rows in the argument, although a particular index row is associated unambiguously with a unique numeric value. This is because the \proglang{STL} map class does not preserve the orders of its elements. This does not matter, as the order in which the elements are stored is immaterial in the use-cases presented here. Extract and replace methods require the index to be a matrix\footnote{Indexing with a vector (interpreted as a row of the index matrix) is problematic. The package requires idiom such as~\code{S[1,2,3]} and~\code{S[1,1:3,3]} to work as expected; and because \code{[.spray()} and \code{[<-.spray()} dispatch on the first argument, the package does not attempt to guess what the user intended.}: <>= S1[diag(3)] <- -3 S1 @ \noindent We can see that a value with an existing index is overwritten, while new elements are created as necessary. Addition is implemented: <>= M2 <- matrix(c( 6, -7, 8, 0, 0, 2, 1, 1, 3), byrow=TRUE, ncol=3) S2 <- spray(M2, c(17, 11 ,-4)) S2 S1 <- S1 + S2 S1 @ \noindent Thus element~\code{[0,0,2]} becomes $2+11=13$, while element~\code{[1,1,3]} cancels and thus vanishes. There is no requirement for indices to be positive: Element~\code{[6,-7,8]} is new (with value~17). Even though the representation of the spray object does not preserve the order of the index rows in the argument, a particular index row is associated unambiguously with a unique numeric value. The package uses \pkg{disordR} discipline, discussed below in section~\ref{disord}. \section{The spray package and multivariate polynomials} One natural application for \code{spray} objects is multivariate polynomials~\citep{hankin2008}. I will first discuss the univariate case and then progress to multivariate polynomials. \subsection{Univariate polynomials} Univariate polynomials are a good place to start. Suppose the polynomial \[ A = 1+2x^3 + 6x^8 \] \noindent were to be represented using R objects. One natural approach, taken in the \pkg{polynomial} package~\citep{venables2016}, is to store the coefficients in a vector: <>= library("polynom") A <- polynomial(c(1, 0, 0, 2, 0, 0, 0, 0, 6)) dput(A) A @ \noindent But again see how the \proglang{R} object thus created stores zero elements, which can be problematic if the polynomial in question is large degree and sparse. Similar issues arise in the case of multivariate polynomials. The~\pkg{multipol} package~\citep{hankin2008} uses a similar methodology, storing coefficients as an arbitrary-dimensional array. However, as noted above, this often leads to inefficient computation. \subsection{Multivariate polynomials} One natural and useful interpretation of a sparse array is as a multivariate polynomial. Consider the following sparse array: <>= S3 <- spray(matrix(c(0,0,0, 0,0,1, 1,1,1, 3,0,0), byrow=TRUE, ncol=3), 1:4) S3 @ \noindent It is natural to interpret the rows of the index matrix as powers of different variables of a multivariate polynomial, and the values as being the coefficients. This is realized in the package using the \code{polyform} option, which if set to \code{TRUE}, modifies the print method: <>= options(polyform = TRUE) S1 @ \noindent (only the print method has changed; \code{S1} is as before). The print method interprets, by default, the three columns as variables $x,y,z$ although this behaviour is user-definable. With this interpretation, multiplication and addition have natural definitions as multivariate polynomial multiplication and addition: <>= S1 + S2 S1 * S2 S1^2 @ It is possible to introduce an element of symbolic calculation, exhibiting familiar algebraic identities. Consider the \code{lone()} function, which creates a sparse array whose multivariate polynomial interpretation is a single variable: <>= x <- lone(1, 3) y <- lone(2, 3) z <- lone(3, 3) options(polyform = FALSE) list(x, y, z) options(polyform = TRUE) (1 + x + y)^3 (x + y) * (y + z) * (x + z) - (x + y + z) * (x*y + x*z + y*z) (x + y) * (x - y) - (x^2 - y^2) @ \subsection{The null polynomial and arity issues} The package is intended to provide functionality for sparse arrays, one interpretation of which is multivariate polynomials. The package, implementing sparse arrays, forbids the addition of two sparse arrays with different dimensionalities: \begin{Schunk} \begin{Sinput} R> lone(1, 2) + lone(1, 1) \end{Sinput} \begin{Soutput} Error: arity(S1) == arity(S2) is not TRUE \end{Soutput} \end{Schunk} One problematic object is the empty array. Zero multinomials are represented with a zero-row index matrix and zero-length numeric vector of values. Because a spray object is a sparse array, a zero multinomial must have a specific arity\footnote{This philosophy is different from earlier versions of the software which treated the empty array as the zero multinomial, with which addition and multiplication were defined algebraically. I would like to thank an anonymous \emph{R Journal} referee for this insight.}. \subsection{Algebraic identities} Similar but more involved techniques can be used to prove Euler's four-square identity and Degen's eight-square identity, given in the package's test suite. However, it should be noted that the \pkg{mpoly} package has a more natural idiom and does not suffer from the visual defect of arbitrary term ordering. \subsection{Further functionality} Multivariate polynomials have a natural interpretation as functions: <>= (S4 <- spray(cbind(1:3, 3:1), 1:3)) f <- as.function(S4) f(c(1, 2)) @ The last line showing the result of substituting~$x=1,y=2$ into~\code{S4}. Other algebraic operations include substitution and partial differentiation. Consider the homogeneous polynomial in three variables: <>= (S5 <- homog(3, 3)) @ Interpreting~\code{S5} as a multivariate polynomial with variables~$x,y,z$ we may substitute~$y=5$ using the \code{subs()} function: <>= subs(S5, 2, 5) @ Differentiation is also straightforward. Suppose we wish to calculate the multivariate polynomial corresponding to \[ \frac{\partial^6}{\partial x\,\partial^2y\,\partial^3z} \left(xyz + x+2y+3z\right)^3 \] This would be <<>>= aderiv((xyz(3) + linear(1:3))^3, 1:3) @ \section{Manipulation of coefficients}\label{disord} The \pkg{spray} package uses \pkg{disordR} discipline~\citep{hankin2022_disordR} for manipulation of the coefficients. Thus: <<>>= set.seed(0) (A <- rspray()) coeffs(A) @ We see above that the coefficients of object \code{A} are not an ordinary \proglang{R} vector but a \code{disord} object. Because the coefficients are stored in an implementation-specific order, we cannot access or manipulate \code{coeffs(a)[2]}, for example. But some operations are allowed; we may consider the coefficients modulo 3: <<>>= coeffs(A) <- coeffs(A) %% 3 A @ A detailed motivation and use-case for \pkg{spray} is provided by~\citet{hankin2022_disordR}. \section{The package in use: some examples} Multivariate polynomials are useful and efficient structures in a variety of applications. Here I give two examples of the package in use: One drawn from the field of random walks on lattices, and one from recreational combinatorics. \subsection{Random walks on lattices} Random walks on periodic lattices find application in a wide range of applied mathematics including the study of molecular and ionic crystals~\citep{hollander1982}, polymers~\citep{scheunders1989} and photosynthetic units~\citep{montroll1969}. The basic idea is that some entity (exciton, ion, etc) has a well-defined position on a periodic lattice~$\left(\mathbb{Z}/n\mathbb{Z}\right)^d$; it then moves on the lattice, performing a random walk. The examples here have~$d=2$ but extension to arbitrary dimensions is immediate. The periodic lattice itself may be identified with a multivariate polynomial in~$d$ variables, here~$x$ and~$y$; the probability of the entity being at point~$(n,m)$ is the coefficient of~$x^ny^m$. The entity typically moves between adjacent nodes according to a \emph{kernel} polynomial whose coefficients are the probabilities of the moves. Periodicity may be enforced simply by wrapping the polynomial using modular arithmetic. In many cases, entities are not necessarily conserved: the entity may decay (usually with a fixed probability per timestep), or be annihilated when it encounters a particular node in the lattice (a `trap'~\citep{scheunders1989}, corresponding to the formation of sugar in the cell). Here, we work on a~$17\times 17$ lattice as a large but computationally tractable domain. All these processes have natural and efficient~\proglang{R} idiom in the \pkg{spray} package. We may specify a kernel allowing movement to adjacent nodes, or to stay in the same place with equal probability: <<>>= d <- 2 kernel <- spray(rbind(0, diag(d), -diag(d)))/(1 + 2*d) @ \noindent At the first timestep, the the entity is at, say, point~$\left(10,10\right)$ with probability 1: <<>>= initial <- spray(rep(10, d)) @ \noindent Finding the probability mass function of the entity after, say 14 timesteps, is straightforward: <<>>= t14 <- initial * kernel^14 @ Traps may be assigned using standard indexing and we will work with a~$17\times 17$ array: <<>>= traps <- matrix(c(2, 3, 3, 5), 2, 2) n <- 17 @ Then we may calculate the evolution of the probability mass function as follows: <<>>= timestep <- function(state, kernel, traps){ state <- state * kernel state <- spray(index(state)%%n, coeffs(state), addrepeats = TRUE) state[traps] <- 0 return(state) } @ In function \code{timestep()}, the first line uses standard multivariate polynomial multiplication to advance the state of the entity; the second enforces periodic boundary conditions, and the third implements the traps' annihilation of the entity. The probability of the entity still existing after 100 timesteps is then: <<>>= state <- initial for(i in 1:100){state <- timestep(state, kernel, traps)} sum(coeffs(state)) @ Note the streamlined \proglang{R} idiom: It is not clear how such manipulations could be performed using the \pkg{mpoly} or the \pkg{multipol} packages. \subsection{Recreational combinatorics} Suppose we consider a chess knight and ask how many ways are there for the knight to return to its starting square in 6 moves. Such questions are most naturally answered by using generating functions. On an infinite chessboard, we might define the multivariate generating polynomial\footnote{Standard terminology, although it might be more accurately referred to as a multivariate Laurent polynomial.}~$k$ for a knight as \[ k = x^{2}y + x^2y^{-1} + x^{-2}y + x^{-2}y^{-1} + xy^{2} + xy^{-2} + x^{-1}y^{2} + x^{-1}y^{-2} \] where we have identified powers of~$x$ with squares moved horizontally (counted algebraically, negative powers mean move to the left), and powers of~$y$ with squares moved vertically. Then the coefficient of~$x^{a}y^{b}$ in~$k$ is the number of ways of moving from the origin [that is, $x^{0}y^{0}$] to square~$(a,b)$. Similarly, $k^n$ is the generating function for a knight which makes~$n$ moves: The coefficient of~$x^{a}y^{b}$ in~$k^n$ is the number of ways of moving from the origin to square~$(a,b)$. The \proglang{R} idiom for this is straightforward; we define \code{chess\_knight}, a spray object with rows corresponding to the possible moves the chess piece may make: <>= chess_knight <- spray(matrix( c(1, 2, 1, -2, -1, 2, -1, -2, 2, 1, 2, -1, -2, 1, -2, -1), byrow = TRUE,ncol = 2)) options(polyform = FALSE) chess_knight options(polyform = TRUE) chess_knight @ \noindent Then \code{chess\_knight[i,j]} gives the number of ways the piece can move from square~\code{[0,0]} to~\code{[i,j]}; and~\code{(chess\_knight\^{}n)[i,j]} gives the number of ways the piece can reach~\code{[i,j]} in \code{n} moves. To calculate the number of ways that the piece can return to its starting square we simply raise \code{chess\_knight} to the sixth power and extract the \code{[0,0]} coefficient: <>= constant(chess_knight^6, drop = TRUE) @ (function \code{constant()} extracts the coefficient corresponding to zero power). One natural generalization would be to arbitrary dimensions. A $d$-dimensional knight moves two squares in one direction, followed by one square in another direction: <>= knight <- function(d){ n <- d * (d - 1) out <- matrix(0, n, d) jj <- cbind(rep(seq_len(n), each=2), c(t(which(diag(d)==0, arr.ind=TRUE)))) out[jj] <- seq_len(2) spray(rbind(out, -out, `[<-`(out, out==1, -1),`[<-`(out, out==2, -2))) } @ Then, considering a four-dimensional chessboard (Figure~\ref{four_dimensional_knight}): \begin{figure}[h] \centering \includegraphics[width=10cm]{four_dimensional_knight.pdf} \caption{Four-dimensional knight\label{four_dimensional_knight} on a $4\times 4\times 4\times 4$ board. Cells attacked by the knight shown by dots} \end{figure} <>= constant(knight(4)^6, drop = TRUE) @ It is in such cases that the efficiency of the \proglang{map} class becomes evident: On my system (3.4\,GHz Intel Core i5 iMac), the above call took just under~0.4 seconds of elapsed time whereas the same\footnote{Because \pkg{mpoly} does not accept negative powers, the calculation was equivalent to \code{(knight(4) + xyz(4)\^{}2)\^{}6}. Also note that the \pkg{multipol} package is not able to execute these commands in a reasonable time.} calculation took over 173 seconds using \pkg{mpoly}. If we want the number of ways to return to the starting point in 6 or fewer moves, we can simply add the unit multinomial and take the sixth power of the sum: <>= constant((1 + knight(4))^6, drop=TRUE) @ (0.6 seconds for~\pkg{spray} vs 275 seconds for~\pkg{mpoly}). For 8 moves, the differences are more pronounced, with \pkg{spray} taking 4.0 seconds and~\pkg{mpoly} requiring more than 1500 seconds). \section{Conclusions and further work} The \pkg{spray} package provides functionality for sparse, arbitrarily-dimensioned arrays. One natural interpretation of a sparse array is as a multivariate polynomial and the package leverages the \proglang{map} class of \proglang{C++} to give fast polynomial multiplication. The functionality provided overlaps with that of \pkg{multipol} and \pkg{mpoly}. The \pkg{multipol} package is too slow to be of practical value for any but the smallest illustrative objects. The different name-based philosophy employed by the \pkg{mpoly} package is certainly an advantage in terms of natural \proglang{R} idiom, although there is a performance penalty. There are also occasional applications of multivariate polynomials (such as random walks on lattices) in which the structure of \pkg{spray} is a conceptual advantage. British mathematician J. H. Wilkinson famously defined a matrix to be ``sparse'' if it has enough zeros that it pays to take advantage of them; this definition applies to the arrays considered here. However, other definitions of sparsity are possible. Consider the following example, taken from \cite{kahle2013}: \[ ab^2+bc^2+cd^2+\cdots+yz^2+za^2. \] The \pkg{spray} idiom for such an expression is <<>>= a <- diag(26) options(sprayvars = letters) a[1 + cbind(0:25, 1:26) %% 26] <- 2 spray(a) @ \noindent ---but it is clear that the index matrix has a large degree of sparseness which \pkg{mpoly} takes advantage of and \code{spray} does not. Further work might include the development of name-based multivariate polynomials using concepts from \proglang{STL} to provide the best of both worlds (and indeed the \pkg{mvp} package~\citep{hankin2022_mvp} does just this, and more). \bibliography{spray} \newpage \appendix \section{Package philosophy} The \pkg{spray} package does not interact or depend on \pkg{multipol} in any way, owing to the very different design philosophies used. The package uses the~\proglang{C++} Standard Template Library's \proglang{map} class~\citep{musser2009} to store and retrieve elements. A \emph{map} is an associative container that stores values indexed by a key, which is used to sort and uniquely identify the values. In the package, the key is a \proglang{vector} object or a \proglang{deque} object with (signed) integer elements. In the \proglang{STL}, a \proglang{map} object stores keys and associated values in whatever order the software considers to be most propitious. This allows faster access and modification times but the order in which the maps are stored is implementation specific. In the case of sparse arrays, this is not an issue because the nonzero entries do not possess a natural order, unlike dense arrays in which lexicographic ordering is used. For multivariate polynomials, the order of storage is not important algebraically because addition is commutative and associative. These issues are accommodated using \pkg{disordR} discipline which forbids implementation-specific idiom~\citep{hankin2022_disordR}. \subsection*{Compile-time options} At compile time, the package offers two options. Firstly one may use the \proglang{unordered\_map} class in place of the \proglang{map} class. This option is provided in the interests of efficiency. An unordered map has lookup time~${\mathcal O}(1)$ (compare~${\mathcal O}(\log n)$ for the map class), but overhead is higher. The other option offered is the nature of the key, which may be either \proglang{vector} class or \proglang{deque} class. Elements of a \proglang{vector} are guaranteed to be contiguous in memory, unlike a \proglang{deque}. This does not appear to make a huge difference to timings, but the default (\proglang{unordered\_map} indexed by a \proglang{vector}) appears to be marginally the fastest option. \end{document}