--- title: "mathml: Translate R expressions to MathML and LaTeX" author: "Matthias Gondan and Irene Alfarone (Universität Innsbruck, Department of Psychology)" date: "2024-01-30" output: rmarkdown::html_vignette bibliography: bibliography.bibtex preamble: - \usepackage{cancel} - \usepackage{amsmath} Vignette: > %\VignetteIndexEntry{mathml: Translate R expressions to MathML and LaTeX} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} library(mathml) ``` The R\ extension of the markdown language [@Xie2018;@rmarkdown] enables reproducible statistical reports with nice typesetting in HTML, Microsoft Word, and LaTeX. Moreover, since recently [@R, version 4.3], R\'s manual pages include support for mathematical expressions [@Sarkar2022;@Viechtbauer2022], which is already a big improvement. However, except for special cases such as regression models [@equatiomatic] and R's own plotmath annotation, rules for the mapping of built-in language elements to their mathematical representation are still lacking. So far, R\ expressions such as `pbinom(k, N, p)` are printed as they are and pretty mathematical formulae such as \(P_{\mathrm{Bi}}(X \le k; N, p)\) require explicit LaTeX commands like `P_{\mathrm{Bi}}\left(X \le k; N, p\right)`. Except for very basic use cases, these commands are tedious to type and their source code is hard to read. The present R\ package defines a set of rules for the automatic translation of R\ expressions to mathematical output in R\ Markdown documents [@Xie2020] and Shiny Apps [@Chang2022]. The translation is done by an embedded Prolog interpreter that maps nested expressions recursively to MathML and LaTeX/MathJax, respectively. User-defined hooks enable extension of the set of rules, for example, to represent specific R\ elements by custom mathematical signs. The main feature of the package is that the same R\ expressions and equations can be used for both mathematical typesetting and calculations. This saves time and potentially reduces mistakes, as will be illustrated below. Similar to other high-level programming languages, R is homoiconic, that is, R\ commands (i.e., R\ "calls") are, themselves, symbolic data structures that can be created, parsed and modified. Because the default response of the R\ interpreter is to evaluate a call and return its result, this property is not transparent to the general user. There exists, however, a number of built-in R\ functions (e.g., `quote()`, `call()` etc.) that allow the user to create R\ calls which can be stored in regular variables and then, for example, evaluated at a later stage or in a specific environment [@Wickham2019]. The present package includes a set of rules that translate such calls to a mathematical representation in MathML and LaTeX. For a first illustration of the _mathml_ package, we consider the binomial probability. ```{r} term <- quote(pbinom(k, N, p)) term ``` The term is quoted to avoid its immediate evaluation (which would raise an error anyway since the variables `k`, `N`, `p` have not yet been defined). Experienced readers will remember that the quoted expression above is a short form for ```r term <- call("pbinom", as.name("k"), as.name("N"), as.name("p")) ``` As can be seen from the output, to the variable `term` is not assigned the result of the calculation, yet an R\ call [see, e.g., @Wickham2019, for details on "non-standard evaluation"], which can eventually be evaluated with `eval()`, ```{r} k <- 10 N <- 22 p <- 0.4 eval(term) ``` The R\ package _mathml_ can now be used to render the call in MathML, that is the dialect for mathematical elements on HTML webpages or in MathJax/LaTeX, as shown below (some of the curly braces are not really needed in this simple example, but are necessary in edge cases). ```{r} library(mathml) mathjax(term) ``` Some of the curly braces are not really needed in the LaTeX output, but are necessary in edge cases. The package also includes a function `mathout()` that wraps a call to `mathml()` for HTML output and `mathjax()` for LaTeX output. Moreover, the function `math(x)` adds the class `"math"` to its argument, such that a special knitr printing function is invoked [see the vignette on custom print methods in @knitr]. An R\ Markdown code chunk with `mathout(term)` thus produces: ````{r, echo=FALSE} math(term) ```` Similarly, `inline()` produces inline output, `` `r "\x60r inline(term)\x60"` `` yields `r inline(term)`. # Package _mathml_ in practice _mathml_ is an R\ package for pretty mathematical representation of R\ functions and objects in data analysis, scientific reports and interactive web content. The currently supported features are listed below, roughly following the order proposed by Murrell and Ihaka [-@murrell2000]. ## Basic elements _mathml_ handles the basic elements of everyday mathematical expressions, such as numbers, Latin and Greek letters, multi-letter identifiers, accents, subscripts, and superscripts. ```{r} term <- quote(1 + -2L + a + abc + "a" + phi + Phi + varphi + roof(b)[i, j]^2L) math(term) term <- quote(round(3.1415, 3L) + NaN + NA + TRUE + FALSE + Inf + (-Inf)) math(term) ``` An expression such as `1 + -2` may be considered unsatisfactory from an aesthetic perspective. It is correct R\ syntax, though, and is reproduced accordingly, without the parentheses. Parentheses around negated numbers or symbols can be added as shown above for `+ (-Inf)`. If `round` is not given, R's default number of decimals from `getOption("digits")` is used (which is way too large in the author's opinion---in line with the unfortunate practice of statistics programs). To avoid name clashes with package _stats_, `roof()` is used instead of `hat()` to put a hat on a symbol (see next section for further decorations). Note that an R\ function `roof()` does not exist in base R, it is provided by the package for convenience and points to the identity function. ## Decorations The package offers some support for different fonts and colors as well as accents and boxes etc. Internally, these decorations are implemented as identity functions, so they can be introduced into R expressions without side-effects. ```{r} term <- quote(bold(b[x, 5L]) + bold(b[italic(x)]) + italic(ab) + italic(42L)) math(term) term <- quote(tilde(a) + mean(X) + boxed(c) + cancel(d) + phantom(e) + prime(f)) math(term) ``` Note that the font styles only affect the display of identifiers, whereas numbers, character strings etc. are left untouched. Colors can be specified as color names, RGB-values (in decimal or hexadecimal format), HSL-values and HWB-values. ```{r} term <- quote(color("red", tilde(a)) + color("rgb(255, 0, 0)", mean(X)) + color("#FF0000", boxed(c)) + color("hsl(0, 100%, 50%)", cancel(d)) + color("hwb(0 0% 0%)", prime(f))) math(term) ``` ## Operators and parentheses Arithmetic operators and parentheses are translated as they are, as illustrated below. ```{r} term <- quote(a - ((b + c)) - d*e + f*(g + h) + i/j + k^(l + m) + (n*o)^{p + q}) math(term) term <- quote(dot(a, b) + frac(1L, nodot(c, d + e)) + dfrac(1L, times(g, h))) math(term) ``` For multiplications involving only numbers and symbols, the multiplication sign is omitted. This heuristic does not always produce the desired result; therefore, _mathml_ defines alternative R\ functions `dot()`, `nodot()`, and `times()`. These functions calculate a product and produce the respective multiplication signs. Similarly, `frac()` and `dfrac()` can be used for small and large fractions. For standard operators with known precedence, _mathml_ is generally able to detect if parentheses are needed; for example, parentheses are automatically placed around `d + e` in the `nodot`-example. However, we note unnecessary parentheses around `l + m` above. Thes parentheses are a consequence of `quote(a^(b + c))` actually producing a nested R\ call of the form `'^'(a, (b + c))` instead of `'^'(a, b + c)`: ```{r} term <- quote(a^(b + c)) paste(term) ``` For the present purpose, this feature is unfortunate because extra parentheses around `b + c` are not needed. The preferred result is obtained by using the functional form `quote('^'(k, l + m))` of the power, or curly braces as a workaround (see `p + q` above). ## Custom operators Whereas in standard infix operators, the parentheses typically follow the rules for precedence, undesirable results may be obtained in custom operators. ```{r} term <- quote(mean(X) %+-% 2L * s / sqrt(N)) math(term) term <- quote('%+-%'(mean(X), 2L * s / sqrt(N))) # functional form of '%+-%' term <- quote(mean(X) %+-% {2L * s / sqrt(N)}) # the same math(term) ``` The example is a reminder that it is not possible to define the precedence of custom operators in R, and that expressions with such operators are evaluated strictly from left to right. Again, the problem can be worked around by the functional form of the operator, or a curly brace to hide the parenthesis but enforce the correct operator precedence. More operators are shown in Table\ 1, including the suggestions by Murrell and Ihaka [-@murrell2000] for graphical annotations and arrows in R\ figures. ```{r custom-operators, echo=FALSE} op1 <- list( "A\\ %*%\\ B"=quote(A %*% B), "A\\ %.%\\ B"=quote(A %.% B), "A\\ %x%\\ B"=quote(A %x% B), "A\\ %/%\\ B"=quote(A %/% B), "A\\ %%\\ B"=quote(A %% B), "A\\ &\\ B"=quote(A & B), "A\\ |\\ B"=quote(A | B), "xor(A,\\ B)"=quote(xor(A, B)), "!A"=quote(!A), "A\\ ==\\ B"=quote(A == B), "A\\ <-\\ B"=quote(A <- B)) m1 <- lapply(op1, FUN=mathout, flags=list(cat=FALSE)) op1 <- names(op1) if(knitr::is_latex_output()) op1 <- sapply(op1, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op1 <- sapply(op1, FUN=xfun::html_escape, attr=TRUE) op2 <- list( "A\\ !=\\ B"=quote(A != B), "A\\ ~ B"=quote(A ~ B), "A\\ %~~%\\ B"=quote(A %~~% B), "A\\ %==%\\ B"=quote(A %==% B), "A\\ %=~%\\ B"=quote(A %=~% B), "A\\ %prop%\\ B"=quote(A %prop% B), "A\\ %in%\\ B"=quote(A %in% B), "intersect(A,\\ B)"=quote(intersect(A, B)), "union(A,\\ B)"=quote(union(A, B)), "crossprod(A,\\ B)"=quote(crossprod(A, B)), "is.null(A)"=quote(is.null(A))) m2 <- lapply(op2, FUN=mathout, flags=list(cat=FALSE)) op2 <- names(op2) if(knitr::is_latex_output()) op2 <- sapply(op2, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op2 <- sapply(op2, FUN=xfun::html_escape, attr=TRUE) op3 <- list( "A\\ %<->%\\ B"=quote(A %<->% B), "A\\ %->%\\ B"=quote(A %->% B), "A\\ %<-%\\ B"=quote(A %<-% B), "A\\ %up%\\ B"=quote(A %up% B), "A\\ %down%\\ B"=quote(A %down% B), "A\\ %<=>%\\ B"=quote(A %<=>% B), "A\\ %=>%\\ B"=quote(A %=>% B), "A\\ %<=%\\ B"=quote(A %<=% B), "A\\ %dblup%\\ B"=quote(A %dblup% B), "A\\ %dbldown%\\ B"=quote(A %dbldown% B), " "="") m3 <- lapply(op3, FUN=mathout, flags=list(cat=FALSE)) op3 <- names(op3) if(knitr::is_latex_output()) op3 <- sapply(op3, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op3 <- sapply(op3, FUN=xfun::html_escape, attr=TRUE) t <- cbind(Operator=op1, Output=m1, Operator=op2, Output=m2, Operator=op3, Arrow=m3) knitr::kable(t, caption="Table 1. Custom operators in mathml", row.names=FALSE, escape=FALSE) ``` ## Builtin functions There is support for most functions from package _base_, with adequate use and omission of parentheses. ```{r} term <- quote(sin(x) + sin(x)^2L + cos(pi/2L) + tan(2L*pi) * expm1(x)) math(term) term <- quote(choose(N, k) + abs(x) + sqrt(x) + floor(x) + exp(frac(x, y))) math(term) ``` A few more examples are shown in Table\ 2, including functions from _stats_. ```{r base-stats, echo=FALSE} op1 <- list( "sin(x)"=quote(sin(x)), "cosh(x)"=quote(cosh(x)), "tanpi(alpha)"=quote(tanpi(alpha)), "asinh(x)"=quote(asinh(x)), "log(p)"=quote(log(p)), "log1p(x)"=quote(log1p(x)), "logb(x,\\ e)"=quote(logb(x, e)), "exp(x)"=quote(exp(x)), "expm1(x)"=quote(expm1(x)), "choose(n,\\ k)"=quote(choose(n, k)), "lchoose(n,\\ k)"=quote(lchoose(n, k)), "factorial(n)"=quote(factorial(n)), "lfactorial(n)"=quote(lfactorial(n)), "sqrt(x)"=quote(sqrt(x)), "mean(X)"=quote(mean(X)), "abs(x)"=quote(abs(x))) m1 <- lapply(op1, FUN=mathout, flags=list(cat=FALSE)) op1 <- names(op1) if(knitr::is_latex_output()) op1 <- sapply(op1, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op1 <- sapply(op1, FUN=xfun::html_escape, attr=TRUE) op2 <- list( "dbinom(k,\\ N,\\ pi)"=quote(dbinom(k, N, pi)), "pbinom(k,\\ N,\\ pi)"=quote(pbinom(k, N, pi)), "qbinom(p,\\ N,\\ pi)"=quote(qbinom(p, N, pi)), "dpois(k,\\ lambda)"=quote(dpois(k, lambda)), "ppois(k,\\ lambda)"=quote(ppois(k, lambda)), "qpois(p,\\ lambda)"=quote(qpois(p, lambda)), "dexp(x,\\ lambda)"=quote(dexp(x, lambda)), "pexp(x,\\ lambda)"=quote(pexp(x, lambda)), "qexp(p,\\ lambda)"=quote(qexp(p, lambda)), "dnorm(x,\\ mu,\\ sigma)"=quote(dnorm(x, mu, sigma)), "pnorm(x,\\ mu,\\ sigma)"=quote(pnorm(x, mu, sigma)), "qnorm(alpha/2L)"=quote(qnorm(alpha/2L)), "1L\\ -\\ pchisq(x,\\ 1L)"=quote(1L - pchisq(x, 1L)), "qchisq(1L\\ -\\ alpha,\\ 1L)"=quote(qchisq(1L-alpha, 1L)), "pt(t,\\ N\\ -\\ 1L)"=quote(pt(t, N-1L)), "qt(alpha/2L,\\ N\\ -\\ 1L)"=quote(qt(alpha/2L, N-1L))) m2 <- lapply(op2, FUN=mathout, flags=list(cat=FALSE)) op2 <- names(op2) if(knitr::is_latex_output()) op2 <- sapply(op2, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op2 <- sapply(op2, FUN=xfun::html_escape, attr=TRUE) t <- cbind(Function=op1, Output=m1, Function=op2, Output=m2) knitr::kable(t, caption="Table 2. R functions from _base_ and _stats_", row.names=FALSE, escape=FALSE) ``` ## Custom functions For self-written functions, the matter is somewhat more complicated. For a function such as `g <- function(...) ...`, the name _g_ is not transparent to R, because only the function body is represented. We can still display functions in the form `head(x) = body` if we embed the object to be shown into a call `"<-"(head, body)`. ```{r} sgn <- function(x) { if(x == 0L) return(0L) if(x < 0L) return(-1L) if(x > 0L) return(1L) } math(sgn) math(call("<-", quote(sgn(x)), sgn)) ``` As shown in the example, we can still display functions in the form `head(x) = body` if we embed the object to be shown into a call `"<-"(head, body)`. The function body is generally a nested R\ call of the form `'{'(L)`, with `L` being a list of commands (the semicolon, not necessary in R, is translated to a newline). As illustrated in the example, _mathml_ provides limited support for control structures such as `if`. ## Indices and powers Indices in square brackets are rendered as subscripts, powers are rendered as superscript. Moreover, _mathml_ defines the functions `sum_over(x, from, to)`, and `prod_over(x, from, to)` that simply return their first argument. The other two arguments serve as decorations (_to_ is optional), for example, for summation and product signs. ```{r} term <- quote(S[Y]^2L <- frac(1L, N) * sum(Y[i] - mean(Y))^2L) math(term) term <- quote(log(prod_over(L[i], i==1L, N)) <- sum_over(log(L[i]), i==1L, N)) math(term) ``` ## Ringing back to R R\'s `integrate` function takes a number of arguments, the most important ones being the function to integrate, and the lower and the upper bound of the integration. ```{r} term <- quote(integrate(sin, 0L, 2L*pi)) math(term) eval(term) ``` For mathematical typesetting in the form of \(\int f(x)\, dx\), _mathml_ needs to find out the name of the integration variable. For that purpose, the underlying Prolog bridge provides a predicate `r_eval/3` that calls R\ from Prolog. In the example above, this predicate evaluates `formalArgs(args(sin))`, which returns the names of the arguments of `sin`, namely, `x`. Note that in the example above, the quoted term is an abbreviation for `call("integrate", quote(sin), ...)`, with `sin` being an R\ symbol, not a function. While the R\ function `integrate()` can handle both symbols and functions, _mathml_ needs the symbol because it is unable to determine the function name of custom functions. ## Names and order of arguments One of R's great features is the possibility to refer to function arguments by their names, not only by their position in the list of arguments. At the other end, Prolog does not have such a feature. Therefore, the Prolog handlers for R\ calls are rather rigid, for example, `integrate/3` accepts exactly three arguments in a particular order and without names, that is, `integrate(lower=0L, upper=2L*pi, sin)`, would not print the desired result. To "canonicalize" function calls with named arguments and arguments in unusual order, _mathml_ provides an auxiliary R\ function `canonical(f, drop)` that reorders the argument list of calls to known R\ functions and, if `drop=TRUE` (which is the default), also removes the names of the arguments. ```{r} term <- quote(integrate(lower=0L, upper=2L*pi, sin)) canonical(term) ``` ```{r} math(canonical(term)) ``` This function can be used to feed mixtures of partially named and positional arguments into the renderer. For details, see the R\ function `match.call()`. ## Matrices and Vectors Of course, _mathml_ also supports matrices and vectors. ```{r} v <- 1:3 math(call("t", v)) A <- matrix(data=11:16, nrow=2, ncol=3) B <- matrix(data=21:26, nrow=2, ncol=3) term <- call("+", A, B) math(term) ``` Note that the seemingly more convenient `term <- quote(A + B)` yields \(A + B\) in the output---instead of the desired matrix representation. This behavior is expected because quotation of R calls also quote the components of the call (here, _A_ and _B_). ## Short mathematical names for R symbols In typical R\ functions, variable names are typically longer than just single letters, which may yield unsatisfactory results in the mathematical output. ```{r} term <- quote(dbinom(successes, Ntotal, prob)) hook(successes, k) hook(quote(Ntotal), quote(N), quote=FALSE) hook(prob, pi) math(term) hook(prob, p) # update hook math(term) ``` To improve the situation, _mathml_ provides a simple hook that can be used to replace elements (e.g., verbose variable names) of the code by concise mathematical symbols, as illustrated in the example. To simplify notation, the `quote` flag of `hook()` defaults to TRUE, and `hook()` uses non-standard evaluation to unpack its arguments. If quote is FALSE, as shown above, the user has to provide the quoted expressions. Care should be taken to avoid recursive hooks such as `hook(s, s["A"])` that endlessly replace the \(s\) from \(s_{\mathrm{A}}\) as in \(s_{\mathrm{A}_{\mathrm{A}_{\mathrm{A}\cdots}}}\). The hooks can also be used for more complex elements such as R\ calls, with dotted symbols representing Prolog variables. ```{r} term <- quote(pbinom(successes, Ntotal, prob)) hook(pbinom(.K, .N, .P), sum_over(dbinom(i, .N, .P), i=0L, .K)) math(term) ``` The old behavior is restored with `unhook(term)`. ```{r} unhook(pbinom(.K, .N, .P)) math(term) ``` ## Abbreviations We consider the \(t\)-statistic for independent samples with equal variance. To avoid clutter in the equation, the pooled variance \(s^2_{\mathrm{pool}}\) is abbreviated, and a comment is given with the expression for \(s^2_{\mathrm{pool}}\). For this purpose, _mathml_ provides a function `denote(abbr, expr, info)`, with `expr` actually being evaluated, `abbr` being rendered, plus a comment of the form "with `expr` denoting `info`". ```{r} hook(m_A, mean(X)["A"]) ; hook(s2_A, s["A"]^2L) ; hook(n_A, n["A"]) hook(m_B, mean(X)["B"]) ; hook(s2_B, s["B"]^2L) hook(n_B, n["B"]) ; hook(s2_p, s["pool"]^2L) term <- quote(t <- dfrac(m_A - m_B, sqrt(denote(s2_p, frac((n_A - 1L)*s2_A + (n_B - 1L)*s2_B, n_A + n_B - 2L), "the pooled variance.") * (frac(1L, n_A) + frac(1L, n_B))))) math(term) ``` The term is evaluated below. `print()` is needed because the return value of an assignment of the form `t <- dfrac(...)` is not visible in R. ```{r} m_A <- 1.5; s2_A <- 2.4^2; n_A <- 27; m_B <- 3.9; s2_B <- 2.8^2; n_B <- 20 print(eval(term)) ``` ## Context-dependent rendering Consider an educational scenario in which we want to highlight a certain element of a term, for example, that a student has forgotten to subtract the null hypothesis in a \(t\)-ratio: ```{r} t <- quote(dfrac(omit_right(mean(D) - mu[0L]), s / sqrt(N))) math(t, flags=list(error="highlight")) math(t, flags=list(error="fix")) ``` The R function `omit_right(a + b)` uses non-standard evaluation techniques [e.g., @Wickham2019] to return only the left part an operation, and cancels the right part. This may not always be desired, for example, when illustrating how to fix the mistake. For this purpose, the functions `mathml()` or `mathjax()` have an optional argument `flags` which is a list with named elements. In this example, we use this argument to tell _mathml_ how to render such erroneous expressions using the flag `error` which is one of asis, highlight, fix, or ignore. For more examples, see Table 3. ```{r mistakes, echo=FALSE} op1 <- list( "omit_left(a\\ +\\ b)"=quote(omit_left(a + b)), "omit_right(a\\ +\\ b)"=quote(omit_right(a + b)), "list(quote(a),\\ quote(omit(b)))"=list(quote(a), quote(omit(b))), "add_left(a\\ +\\ b)"=quote(add_left(a + b)), "add_right(a\\ +\\ b)"=quote(add_right(a + b)), "list(quote(a),\\ quote(add(b)))"=list(quote(a), quote(add(b))), "instead(a,\\ b)\\ +\\ c"=quote(instead(a, b) + c)) asis <- lapply(op1, FUN=mathout, flags=list(cat=FALSE, error="asis")) high <- lapply(op1, FUN=mathout, flags=list(cat=FALSE, error="highlight")) fix <- lapply(op1, FUN=mathout, flags=list(cat=FALSE, error="fix")) igno <- lapply(op1, FUN=mathout, flags=list(cat=FALSE, error="ignore")) op1 <- names(op1) if(knitr::is_latex_output()) op1 <- sapply(op1, FUN=knitr:::escape_latex) if(knitr::is_html_output()) op1 <- sapply(op1, FUN=xfun::html_escape, attr=TRUE) t <- cbind(Operation=op1, "error\\ =\\ asis"=asis, highlight=high, fix=fix, ignore=igno) knitr::kable(t, caption="Table 3. Highlighting elements of a term", row.names=FALSE, escape=FALSE) ``` Further customization requires the assertion of new Prolog rules `math/2`, `ml/3`, `jax/3`, as shown in the Appendix. # Conclusion This package allows R\ to render its terms in pretty mathematical equations. It extends the current features of R\ and existing packages for displaying mathematical formulas in R\ [@murrell2000], but most importantly, _mathml_ bridges the gap between computational needs, presentation of results, and their reproducibility. The package supports both MathML and LaTeX/MathJax for use in R\ Markdown documents, presentations and Shiny App webpages. Researchers or teachers can already use R\ Markdown to conduct analyses and show results, and _mathml_ smoothes this process and allows for integrated calculations and output. As shown in the case study of the previous section, _mathml_ can help to improve data analyses and statistical reports from an aesthetic perspective, as well as regarding reproducibility of research. Furthermore, the package may also allow for a better detection of possible mistakes in R\ programs. Similar to most programming languages [@green1977], R\ code is notoriously hard to read, and the poor legibility of the language is one of the main sources of mistakes. For illustration, we consider again Equation\ 10 in Schwarz [-@schwarz1994]. ````{r} hook(mu_A, mu["A"]) hook(mu_B, mu["B"]) hook(sigma_A, sigma["A"]) hook(sigma_B, sigma["B"]) f1 <- function(tau) { dfrac(c, mu_A) + (dfrac(1L, mu_A) - dfrac(1L, mu_A + mu_B) * ((mu_A*tau - c) * pnorm(dfrac(c - mu_A*tau, sqrt(sigma_A^2L*tau))) - (mu_A*tau + c) * exp(dfrac(2L*mu_A*tau, sigma_A^2L)) * pnorm(dfrac(-c - mu_A*tau, sqrt(sigma_A^2L*tau))))) } math(f1) ```` The first version has a wrong parenthesis, which is barely visible in the code, whereas in the mathematical representation, the wrong curly brace is immediately obvious (the correct version is shown below for comparison). ````{r} f2 <- function(tau) { dfrac(c, mu_A) + (dfrac(1L, mu_A) - dfrac(1L, mu_A + mu_B)) * ((mu_A*tau - c) * pnorm(dfrac(c - mu_A*tau, sqrt(sigma_A^2L*tau))) - (mu_A*tau + c) * exp(dfrac(2L*mu_A*tau, sigma_A^2L)) * pnorm(dfrac(-c - mu_A*tau, sqrt(sigma_A^2L*tau)))) } math(f2) ```` As the reader may know from their own experience, missed parentheses are frequent causes of wrong results and errors that are hard to locate in programming code. This particular example shows that mathematical rendering can help to substantially reduce the amount of careless errors in programming. One limitation of the package is the lack of a convenient way to insert line breaks. This is mostly due to lacking support by MathML and LaTeX renderers. For example, in its current stage, the LaTeX package breqn [@breqn] is mostly a proof of concept. Moreover, _mathml_ only works in one direction, that is, it is not possible to translate from LaTeX or HTML back to R [see @latex2r, for an example]. The package _mathml_ is available for R\ version 4.2 and later, and can be easily installed using the usual `install.packages("mathml")`. At its present stage, it supports output in HTML, LaTeX, and Microsoft Word [via pandoc, @pandoc]. The source code of the package is found at https://github.com/mgondan/mathml. # Appendix A: For package developers If you use `mathml` for your own package, please do not "Import" `mathml` in your DESCRIPTION, but "Depend" on it. ``` Package: onmathml Type: Package Title: A package that uses mathml ... Depends: R (>= 4.3), mathml (>= 1.3) ``` It's not entirely clear why this is needed. # Appendix B: Customizing the package ## Implementation details For convenience, the translation of the R expressions is achieved through a Prolog interpreter provided by another R\ package _rolog_ [@rolog]. If a version of SWI-Prolog [@swipl] is found on the system, _rolog_ connects to it. Alternatively, the SWI-Prolog runtime libraries can be conveniently accessed by installing the R\ package _rswipl_ [@rswipl]. Prolog is a classical logic programming language with many applications in expert systems, computer linguistics and symbolic artificial intelligence. The strength of Prolog lies in its concise representation of facts and rules for knowledge and grammar, as well as its efficient built-in search engine for closed world domains. Whereas Prolog is weak in statistical computation, but strong in symbolic manipulation, the converse may be said for the R\ language. _rolog_ bridges this gap by providing an interface to a SWI-Prolog distribution [@swipl] in R. The communication between the two systems is mainly in the form of queries from R\ to Prolog, but two Prolog functions allow ring back and evaluation of terms in R. The proper term for a Prolog "function" is predicate, and it is typically written with name and arity (i.e., number of arguments), separated by a forward slash. Thus, at the Prolog end, a predicate `math/2` translates the call `pbinom(K, N, Pi)` into a "function" `fn/2` with the name `P_Bi`, one argument `X =< K`, and the two parameters `N` and `Pi`. ```prolog math(pbinom(K, N, Pi), M) => M = fn(subscript('P', "Bi"), (['X' =< K] ; [N, Pi])). ``` `math/2` operates like a "macro" that translates one mathematical element (here, `pbinom(K, N, Pi)`) to a different mathematical element, namely `fn(Name, (Args ; Pars))`. The low-level predicate `ml/3` is used to convert these basic elements to MathML. ```prolog ml(Flags, fn(Name, (Args ; Pars)), M) => ml(Flags, Name, N), ml(Flags, paren(list(op(;), [list(op(','), Args), list(op(','), Pars)])), X), M = mrow([N, mo(&(af)), X]). ``` The relevant rule for `ml/3` builds the MathML entity `mrow([N, mo(&(af)), X])`, with `N` representing the name of the function and `X` its arguments and parameters, enclosed in parentheses. A corresponding rule `jax/3` does the same for MathJax/LaTeX. A list of flags can be used for context-sensitive translation (see, e.g., the section on errors above). Several ways exist for translating new R\ terms to their mathematical representation. We have already seen above how to use "hooks" to translate long variable names from R to compact mathematical signs, as well as functions such as cumulative probabilities \(P(X \le k)\) to different representations like \(\sum_{i=0}^k P(X = i)\). Obviously, the hooks require that there already exists a rule to translate the target representation into MathML and MathJax. In this appendix we describe a few more ways to extend the set of translations according to a user's needs. As stated in the background section, the Prolog end provides two classes of rules for translation, macros `math/2,3,4` mirroring the R\ hooks mentioned above, and the low-level predicates `ml/3` and `jax/3` that create proper MathML and LaTeX terms. ## Linear models To render the model equation of a linear model such as `lm(EOT ~ T0 + Therapy, data=d)` in mathematical form, it is sufficient to map the `Formula` in `lm(Formula, Data)` to its respective equation [see also @equatiomatic]. This can in two ways, using either the hooks described above, or a new `math/2` macro at the Prolog end. ````r hook(lm(.Formula, .Data), .Formula) ```` The hook is simple, but is a bit limited because only R's tilde-form of linear models is shown, and it only works for a call with exactly two arguments. Below is an example of how to build a linear equation of the form \(Y = b_0 + b_1X_1 + ...\) using the Prolog macros from _mathml_. ````prolog math_hook(LM, M) :- compound(LM), LM =.. [lm, ~(Y, Sum) | _Tail], summands(Sum, Predictors), findall(subscript(b, X) * X, member(X, Predictors), Terms), summands(Model, Terms), M = (Y == subscript(b, 0) + Model + epsilon). ```` The predicate `summands/2` unpacks an expression `A + B + C` to a list `[C, B, A]` and vice-versa (see the file `lm.pl` for details). ````{r} rolog::consult(system.file(file.path("pl", "lm.pl"), package="mathml")) term <- quote(lm(EOT ~ T0 + Therapy, data=d, na.action=na.fail)) math(term) ```` ## \(n\)-th root Base R does not provide a function like `cuberoot(x)` or `nthroot(x, n)`, and the present package does not support the respective representation. To obtain a cube root, a programmer would typically type `x^(1/3)` or better `x^{1/3}` (see the practice section why the curly brace is preferred in an exponent), resulting in \(x^{1/3}\) which may still not match everyone's taste. Here we describe the steps needed to represent the \(n\)-th root as \(\sqrt[n]x\). We assume that `nthroot(x, n)` is available in the current namespace [manually defined, or from R package _pracma_, @pracma], so that the names of the arguments and their order are accessible to `canonical()` if needed. As we can see below, _mathml_ uses a default representation `name(arguments)` for such unknown functions. ```{r} nthroot <- function(x, n) x^{1L/n} term <- canonical(quote(nthroot(n=3L, 2L))) math(term) ``` A proper MathML term is obtained by `mlx/3` (the x in mlx indicates that it is an extension and is prioritized over the default ml/3 rules). `mlx/3` recursively invokes `ml/3` for translating the function arguments _X_ and _N_, and then constructs the correct MathML entity `...`. ```prolog mlx(nthroot(X, N), M, Flags) :- ml(X, X1, Flags), ml(N, N1, Flags), M = mroot([X1, N1]). ``` The explicit unification `M = ...` in the last line serves to avoid clutter in the head of `mlx/3`. The Prolog file `nthroot.pl` also includes the respective rule for LaTeX and can be consulted from the package folder via the underlying package _rolog_. ```{r} rolog::consult(system.file(file.path("pl", "nthroot.pl"), package="mathml")) term <- quote(nthroot(a * (b + c), 3L)^2L) math(term) term <- quote(a^(1L/3L) + a^{1L/3L} + a^(1.0/3L)) math(term) ``` The file `nthroot.pl` includes three more statements `precx/3` and `parenx/3`, as well as a `math_hook/2` macro. The first sets the operator precedence of the cubic root above the power, thereby putting a parentheses around nthroot in \((\sqrt[3]{\ldots})^2\). The second tells the system to increase the counter of the parentheses below the root, such that the outer parenthesis becomes a square bracket. The last rule maps powers like `a^(1L/3L)` to `nthroot/3`, as shown in the first summand. Of course, _mathml_ is not a proper computer algebra system. As is illustrated by the other terms in the sum, such macros are limited to purely syntactical matching, and terms like `a^{1L/3L}` with the curly brace or `a^(1.0/3L)` with a floating point number in the numerator are not detected. ## Proofs _Mathml_ provides a basic representation for proof trees in the sequent calculus. The supported predicates are 'rbicond', 'rcond', 'rand', 'ror', 'rneg', 'lbicond', 'lcond', 'land', 'lor', 'lneg', each of which both in the variant with 2 and 3 arguments. The first argument contains the expression for one line and the other arguments for everything above it (which can be other predicate of these). To prevent the precedence setting by R and instead enable the precedences defined in Prolog, the prefix notation should be used. ```{r} rolog::consult(system.file(file.path("pl", "bussproofs.pl"), package="mathml")) term <- quote(rcond('%>%'(P %->% P), ax(P %>% P, ''))) math(term) term <- quote(ror('%>%'('', '%|%'(A, ~A)), rneg('%>%'('', '%,%'(A, ~A)), ax('%>%'(A, A), '')))) math(term) term <- quote( rcond('%>%'('%->%'('%|%'(A, B), ('&'(A, B)))), rand('%>%'('%|%'(A, B), '&'(A, B)), lor('%>%'('%|%'(A, B), A), ax('%>%'(A, A), ''), asq(B%<%A, '')), lor(A%|%B%>%B, asq('%<%'(A, B), ''), ax('%>%'(B, B), ''))))) math(term) ``` ## P-values With pval/2, a value specified as first argument can be displayed with the following rounding rules: * 0.1 < value < 1 : two digits * 0.001 < value < 0.1 : three digits * value < 0.0001 : output defaults to "< 0.0001". The identifier for the value is provided as second argument. ```{r} library(mathml) rolog::consult(system.file(file.path("pl", "pval.pl"), package="mathml")) term <- quote(pval(0.539, P)) math(term) term <- quote(pval(0.0137, p)) math(term) term <- quote(pval(0.0003, P)) math(term) ``` The file `prolog/pval.pl` illustrates how to distinguish the above cases in Prolog. Two nested `math_hook` macros are used to decide if an equation sign is needed or the less-than sign, and to determine the number of decimal places shown. # Acknowledgment Supported by the Erasmus+ program of the European Commission (2019-1-EE01-KA203-051708). # References