\documentclass[a4paper]{article} %\VignetteIndexEntry{Sweave User Manual} %\VignettePackage{utils} %\VignetteDepends{tools} %\VignetteDepends{datasets} %\VignetteDepends{stats} \title{Sweave User Manual} \author{Friedrich Leisch} \usepackage[round]{natbib} \usepackage{graphicx,Rd} \usepackage{listings} \lstset{frame=trbl,basicstyle=\small\tt} \sloppy \begin{document} \maketitle \section{Introduction} \label{sec:intro} Sweave provides a flexible framework for mixing text and R code for automatic document generation. A single source file contains both documentation text and R code, which are then \emph{woven} into a final document containing \begin{itemize} \item the documentation text together with \item the R code and/or \item the output of the code (text, graphs) \end{itemize} This allows to re-generate a report if the input data change and documents the code to reproduce the analysis in the same file that also contains the report. The R code of the complete analysis is embedded into a \LaTeX{} document\footnote{\url{http://www.ctan.org}} using the noweb syntax \citep{flm:Ramsey:1998} which is usually used for literate programming \cite{fla:Knuth:1984}. Hence, the full power of \LaTeX{} (for high-quality typesetting) and R (for data analysis) can be used simultaneously. See \cite{e1071-papers:Leisch:2002} and references therein for more general thoughts on dynamic report generation and pointers to other systems. Sweave uses a modular concept using different drivers for the actual translations. Obviously different drivers are needed for different text markup languages (\LaTeX{}, HTML, \ldots). Several packages on CRAN provide support for other word processing systems. \section{Noweb files} \label{sec:noweb} Noweb \citep{flm:Ramsey:1998} is a simple literate-programming tool which allows to combine program source code and the corresponding documentation into a single file. Different programs allow to extract documentation and/or source code. A noweb file is a simple text file which consists of a sequence of code and documentation segments, these segments are called \emph{chunks}: \begin{description} \item[Documentation chunks] start with a line that has an at sign (\verb|@|) as first character, followed by a space or newline character. The rest of this line is a comment and ignored. Typically documentation chunks will contain text in a markup language like \LaTeX{}. \item[Code chunks] start with \verb|<>=| at the beginning of a line; again the rest of the line is a comment and ignored. \end{description} The default for the first chunk is documentation. In the simplest usage of noweb, the (optional) names of code chunks give the name of source code files, and the tool \texttt{notangle} can be used to extract the code chunk from the noweb file. Multiple code chunks can have the same name, the corresponding code chunks are the concatenated when the source code is extracted. Noweb has some additional mechanisms to cross-reference code chunks (the \verb|[[...]]| operator, etc.), Sweave does currently not use or support this features, hence they are not described here. \section{Sweave files} \label{sec:sweavefile} \subsection{A simple example} Sweave source files are regular noweb files with some additional syntax that allows some additional control over the final output. Traditional noweb files have the extension \texttt{.nw}, which is also fine for Sweave files (and fully supported by the software). Additionally, Sweave currently recognizes files with extensions \texttt{.rnw}, \texttt{.Rnw}, \texttt{.snw} and \texttt{.Snw} to directly indicate a noweb file with Sweave extensions. We will use \texttt{.Rnw} throughout this document. A minimal Sweave file is shown in Figure~\ref{fig:ex1.Rnw}, which contains two code chunks embedded in a simple \LaTeX{} document. Running <<>>= rnwfile <- system.file("Sweave", "example-1.Rnw", package="utils") Sweave(rnwfile) @ translates this into the \LaTeX{} document shown in Figures~\ref{fig:ex1.tex} and~\ref{fig:ex1.pdf}. The latter can also be created directly from within R using <<>>= library("tools") texi2dvi("example-1.tex", pdf=TRUE) @ The first difference between \texttt{example-1.Rnw} and \texttt{example-1.tex} is that the \LaTeX{} style file \texttt{Sweave.sty} is automatically loaded, which provides environments for typesetting R input and output (the \LaTeX{} environments \texttt{Sinput} and \texttt{Soutput}). Otherwise, the documentation chunks are copied without any modification from \texttt{example-1.Rnw} to \texttt{example-1.tex}. \begin{figure}[htbp] \centering \begin{minipage}{0.9\textwidth} \lstinputlisting{\Sexpr{rnwfile}} \end{minipage} \caption{A minimal Sweave file: \texttt{example-1.Rnw}.} \label{fig:ex1.Rnw} \end{figure} The real work of Sweave is done on the code chunks: The first code chunk has no name, hence the default behavior of Sweave is used, which transfers both the R commands and their respective output to the \LaTeX{} file, embedded in \texttt{Sinput} and \texttt{Soutput} environments, respectively. The second code chunk shows one of the Sweave extension to the noweb syntax: Code chunk names can be used to pass options to Sweave which control the final output. \begin{itemize} \item The chunk is marked as a figure chunk (\texttt{fig=TRUE}) such that Sweave creates a PDF file corresponding to the plot created by the commands in the chunk. Furthermore, a \verb|\includegraphics{example-1-002}| statement is inserted into the \LaTeX{} file (details on the choice of filenames for figures follow later in this manual). \item Option \texttt{echo=FALSE} indicates that the R input should not be included in the final document (no \texttt{Sinput} environment). \end{itemize} \begin{figure}[htbp] \centering \begin{minipage}{0.9\textwidth} \lstinputlisting{example-1.tex} \end{minipage} \caption{The output of \texttt{Sweave("example-1.Rnw")} is the file \texttt{example-1.tex}.} \label{fig:ex1.tex} \end{figure} \begin{figure}[htbp] \centering \fbox{\begin{minipage}{0.8\textwidth} \includegraphics[width=\textwidth]{example-1} \end{minipage}} \caption{The final document is created by running \texttt{latex} on \texttt{example-1.tex}.} \label{fig:ex1.pdf} \end{figure} \subsection{Sweave options} Options control how code chunks and their output (text, figures) are transfered from the \texttt{.Rnw} file to the \texttt{.tex} file. All options have the form \texttt{key=value}, where \texttt{value} can be a number, string or logical value. Several options can be specified at once (seperated by commas), all options must take a value (which must not contain a comma or equal sign). Logical options can take the values \texttt{true}, \texttt{false}, \texttt{t}, \texttt{f} and the respective uppercase versions. In the \texttt{.Rnw} file options can be specified either \begin{enumerate} \item inside the angle brackets at the beginning of a code chunk, modifying the behaviour \emph{only for this chunk}, or \item anywhere in a documentation chunk using the command % \begin{quote} \verb|\SweaveOpts{opt1=value1, opt2=value2, ..., optN=valueN}| \end{quote} which modifies the defaults for the rest of the document, i.e., \emph{all code chunks after the statement}. Hence, an \verb|\SweaveOpts| statement in the preamble of the document sets defaults for all code chunks. \end{enumerate} Which options are supported depends on the driver in use. All drivers should at least support the following options (all options appear together with their default value, if any): \begin{description} \item[split=FALSE:] a logical value. If \texttt{TRUE}, then the output is distributed over several files, if \texttt{FALSE} all output is written to a single file. Details depend on the driver. \item[label:] a text label for the code chunk, which is used for filename creation when \texttt{split=TRUE}. \end{description} The first (and only the first) option in a code chunk name can be optionally without a name, then it is taken to be a label. I.e., starting a code chunk with \begin{quote} \verb|<>| \end{quote} is the same as \begin{quote} \verb|<>| \end{quote} but \begin{quote} \verb|<>| \end{quote} gives a syntax error. Having an unnamed first argument for labels is needed for noweb compatibility. If only \verb|\SweaveOpts| is used for setting options, then Sweave files can be written to be fully compatible with noweb (as only filenames appear in code chunk names). \subsection{Using scalars in text} There is limited support for using the values of R objects in text chunks. Any occurrence of \verb|\Sexpr{|\texttt{\textit{expr}}\verb|}| is replaced by the string resulting from coercing the value of the expression \texttt{expr} to a character vector; only the first element of this vector is used. E.g., \verb|\Sexpr{sqrt(9)}| will be replaced by the string \texttt{'3'} (without any quotes). The expression is evaluated in the same environment as the code chunks, hence one can access all objects defined in the code chunks which have appeared before the expression and were not ignored. The expression may contain any valid R code, only curly brackets are not allowed. This is not really a limitation, because more complicated computations can be easily done in a hidden code chunk and the result then be used inside a \verb|\Sexpr|. \subsection{Code chunk reuse} Named code chunks can be reused in other code chunks following later in the document. Consider the simple example \begin{quote} \begin{verbatim} <>= x <- 10 @ <>= x + y @ <>= <> y <- 20 <> @ \end{verbatim} \end{quote} which is equivalent to defining the last code chunk as \begin{quote} \begin{verbatim} <>= x <- 10 y <- 20 x + y @ \end{verbatim} \end{quote} The chunk reference operator \verb|<<>>| takes only the name of the chunk as argument, without any additional Sweave options. \subsection{Syntax definition} So far we have only talked about Sweave files using noweb syntax (which is the default). However, Sweave allows the user to redefine the syntax marking documentation and code chunks, using scalars in text or reuse code chunks. \begin{figure}[htbp] \centering \begin{minipage}{0.9\textwidth} \lstinputlisting{example-1.Stex} \end{minipage} \caption{An Sweave file using \LaTeX{} syntax: \texttt{example-1.Stex}.} \label{fig:ex1.Stex} \end{figure} Figure~\ref{fig:ex1.Stex} shows the example from Figure~\ref{fig:ex1.Rnw} using the \texttt{SweaveSyntaxLatex} definition. It can be created using <<>>= SweaveSyntConv(rnwfile, SweaveSyntaxLatex) @ Code chunks are now enclosed in \texttt{Scode} environments, code chunk reuse is performed using \verb|\Scoderef{chunkname}|. All other operators are the same as in the nowb-style syntax. Which syntax is used for a document is determined by the extension of the input file, files with extension \texttt{.Rtex} or \texttt{.Stex} are assumed to follow the \LaTeX-style syntax. Alternatively the syntax can be changed at any point within the document using the commands \begin{quote} \verb|\SweaveSyntax{SweaveSyntaxLatex}| \end{quote} or \begin{quote} \verb|\SweaveSyntax{SweaveSyntaxNoweb}| \end{quote} at the beginning of a line within a documentation chunk. Syntax definitions are simply lists of regular expression for several Sweave commands, see the two default definitions mentioned above for examples (more detailed intructions will follow once the API has stabilized). \section{Tangling and weaving} The user frontends of the Sweave system are the two R functions \texttt{Stangle()} and \texttt{Sweave()}, both are contained in package \texttt{utils}. \texttt{Stangle} can be used to extract only the code chunks from an \texttt{.Rnw} file and write to one or several files. \texttt{Sweave()} runs the code chunks through R and replaces them with the respective input and/or output. \texttt{Stangle} is actually just a wrapper function for Sweave, which uses a tangling instead of a weaving driver by default. See <>= help("Sweave") @ for more details and arguments of the functions. \subsection{The \texttt{RweaveLatex} driver} This driver transforms \texttt{.Rnw} files with \LaTeX{} documentation chunks and R code chunks to proper \LaTeX{} files (for typesetting both with standard \texttt{latex} or \texttt{pdflatex}), see <>= help("RweaveLatex") @ for details. \subsubsection{Writing to separate files} If \texttt{split} is set to \texttt{TRUE}, then all text corresponding to code chunks (the \texttt{Sinput} and \texttt{Soutput} environments) is written to seperate files. The filenames are of form \texttt{prefix.string-label.tex}, if several code chunks have the same label, their outputs are concatenated. If a code chunk has no label, then the number of the chunk is used instead. The same naming scheme applies to figures. \subsubsection{\LaTeX{} style file and figure sizes} The driver automatically inserts a \verb|\usepackage{Sweave.sty}| command as last line before the \verb|\begin{document}| statement of the final \LaTeX{} file if no \verb|\usepackage{Sweave}| is found in the Sweave source file. This style file defines the environments \texttt{Sinput} and \texttt{Soutput} for typesetting code chunks. If you do not want to include the standard style file, e.g., because you have your own definitions for Sinput and Soutput environemts in a different place, simply insert a comment like \begin{verbatim} % \usepackage{Sweave} \end{verbatim} in the preamble of your latex file, this will prevent automatic insertion of the line. \verb|Sweave.sty| also sets the default \emph{\LaTeX{}} figure width (which is independent of the size of the generated EPS or PDF files). The current default is \begin{verbatim} \setkeys{Gin}{width=0.8\textwidth} \end{verbatim} if you want to use another width for the figures that are automatically generated and included by Sweave, simply add a line similar to the one above \emph{after} \verb|\begin{document}|. If you want no default width for figures insert a \verb|\usepackage[nogin]{Sweave}| in the header of your file. Note that a new graphics device is opened for each figure chunk (option \texttt{fig=TRUE}), hence all graphical parameters of the \texttt{par()} command must be set in each single figure chunk and are forgotten after the respective chunk (because the device is closed when leaving the chunk). Attention: One thing that gets easily confused are the width/height parameters of the R graphics devices and the corresponding arguments to the \LaTeX{} \verb|\includegraphics| command. The Sweave options \texttt{width} and \texttt{height} are passed to the R graphics devices, and hence affect the default size of the produced EPS and PDF files. They do not affect the size of figures in the document, by default they will always be 80\% of the current text width. Use \verb|\setkeys{Gin}| to modify figure sizes or use explicit \verb|\includegraphics| commands in combination with Sweave option \texttt{include=FALSE}. \subsubsection{Prompts and text width} By default the driver gets the prompts used for input lines and continuation lines from R's \texttt{options()} settings. To set new prompts use somthing like \begin{verbatim} options(prompt="MyR> ", continue="...") \end{verbatim} see \texttt{help(options)} for details. Similarly the text width is controlled by option \texttt{"width"}. \subsection{The \texttt{Rtangle} driver} This driver can be used to extract R code chunks from a \texttt{.Rnw} file. Code chunks can either be written to one large file or seperate files (one for each label). The options \texttt{split}, \texttt{prefix}, and \texttt{prefix.string} have the same defaults and interpretation as for the \texttt{RweaveLatex} driver. Use the standard noweb command line tool \texttt{notangle} if other chunks than R code should be extracted. See <>= help("Rtangle") @ for details. \bibliographystyle{plainnat} \bibliography{Sweave} \newpage \appendix \section{Frequently Asked Questions} \label{sec:faq} % \subsection{Where can I find the manual and other information on % Sweave?} % The newest version of the Sweave manual can always be found at the % Sweave homepage % \begin{quote} % \url{http://www.stat.uni-muenchen.de/~leisch/Sweave} % \end{quote} % where you also find several example files, and the lisp and shell % code snippets of the FAQ. In addition, the homepage has several % papers on Sweave like the CompStat paper and the 2-part miniseries % from R News (Issues 2/3 and 2/3). \subsection{How can I get Emacs to automatically recognize files in Sweave format?} Recent versions of ESS (Emacs speaks statistics, \url{http://ess.R-project.org}) automatically recognize files with extension \texttt{.Rnw} as Sweave files and turn on the correct modes. Please follow the instructions on the ESS homepage on how to install ESS on your computer. \subsection{Can I run Sweave directly from a shell?} E.g., for writing makefiles it can be useful to run Sweave directly from a shell rather than manually start R and then run Sweave. This can easily be done using \begin{verbatim} R CMD Sweave file.Rnw \end{verbatim} % A more elaborate solution which also includes automatically running % \texttt{latex} has been written by Gregor Gorjanc and is available % at \url{http://www.bfro.uni-lj.si/MR/ggorjan/software/shell/Sweave.sh}. \subsection{Why does \LaTeX{} not find my EPS and PDF graphic files when the filename contains a dot?} Sweave uses the standard \LaTeX{} package \texttt{graphicx} to handle graphic files, which automatically uses EPS files for standard \LaTeX{} and PDF files for PDF\LaTeX{}, if the name of the input file has no extension, i.e., contains no dots. Hence, you may run into trouble with graphics handling if the name of your Sweave file contains extra dots: \file{foo.Rnw} is OK, while \file{foo.bar.Rnw} is not. % \subsection{Why does Sweave by default create both EPS and PDF % graphic files?} % The \LaTeX{} package \texttt{graphicx} needs EPS files for plain % \LaTeX{}, but PDF files for PDF\LaTeX{} (the latter can also handle % PNG and JPEG files). Sweave automatically creates graphics in EPS % and PDF format, such that the user can freely run \texttt{latex} or % \texttt{pdflatex} on the final document as needed. \subsection{Empty figure chunks give \LaTeX{} errors.} When a code chunk with \texttt{fig=true} does not call any plotting functions invalid EPS and PDF files are created. Sweave cannot know if the code in a figure chunk actually plotted something or not, so it will try to include the graphics, which is bound to fail. \subsection{Why do R lattice graphics not work?} The commands in package \texttt{lattice} have different behavior than the standard plot commands in the \texttt{base} package: lattice commands return an object of class \texttt{"trellis"}, the actual plotting is performed by the \texttt{print} method for the class. Encapsulating calls to lattice functions in \texttt{print()} statements should do the trick, e.g.: \begin{verbatim} <>= library(lattice) print(bwplot(1:10)) @ \end{verbatim} should work. Future versions of Sweave may have more automated means to deal with trellis graphics. \subsection{How can I get Black \& White lattice graphics?} What is the most elegant way to specify that strip panels are to have transparent backgrounds and graphs are to be in black and white when lattice is being used with Sweave? I would prefer a global option that stays in effect for multiple plots. Answer by Deepayan Sarkar: I'd do something like this as part of the initialization: \begin{verbatim} <<...>> library(lattice) ltheme <- canonical.theme(color = FALSE) ## in-built B&W theme ltheme$strip.background$col <- "transparent" ## change strip bg lattice.options(default.theme = ltheme) ## set as default @ \end{verbatim} \subsection{Creating several figures from one figure chunk does not work} Consider that you want to create several graphs in a loop similar to \begin{verbatim} <> for (i in 1:4) plot(rnorm(100)+i) @ \end{verbatim} This will currently \textbf{not} work, because Sweave allows \textbf{only one graph} per figure chunk. The simple reason is that Sweave opens a postscript device before executing the code and closes it afterwards. If you need to plot in a loop, you have to program it along the lines of \begin{verbatim} <>= for(i in 1:4){ file=paste("myfile", i, ".eps", sep="") postscript(file=file, paper="special", width=6, height=6) plot(rnorm(100)+i) dev.off() cat("\\includegraphics{", file, "}\n\n", sep="") } @ \end{verbatim} \subsection{How can I set default \texttt{par()} settings for figure chunks?} Because each EPS and PDF file opens a new device, using \texttt{par()} has only an effect if it is used inside a figure chunk. If you want to use the same settings for a series of figures, it is easier to use a hook function than repeating the same \texttt{par()} statement in each figure chunk. The effect of \begin{verbatim} options(SweaveHooks=list(fig=function() par(bg="red", fg="blue"))) \end{verbatim} should be easy to spot. Do not forget to remove the hook at the end of the Sweave file unless you want to use it as a global option for all Sweave files. % \subsection{Running \texttt{latex} fails on Windows} % If you can create the \file{.tex} file by running % \texttt{Sweave()} in R, but cannot convert the \file{.tex} file % to \file{.dvi} or \file{.pdf}, this is most likely caused by a % space in the path of your R installation. If the path of your R % installation contains any blank characters (like the default % \verb|"c:\Program Files\..."| in English versions of Windows), this % may cause problems, because programs like \texttt{tex} or % \texttt{latex} cannot handle blanks in paths properly. % Two possible solutions: % \begin{enumerate} % \item Install R in a path not containing any blanks. % \item Copy the file \file{Sweave.sty} to a directory in your tex % path or the directory containing the Sweave file and put a % \verb|\usepackage{Sweave}| into the preamble of your Sweave file. % \end{enumerate} \subsection{How can I change the formatting of S input and output chunks?} Sweave uses the \texttt{fancyvrb} package for formatting all S code and text output. \texttt{fancyvrb} is a very powerful and flexible package that allows fine control for layouting text in verbatim environments. If you want to change the default layout, simply read the \texttt{fancyvrb} documentation and modify the definitions of the \texttt{Sinput} and \texttt{Soutput} environments in \file{Sweave.sty}, respectively. \subsection{How can I change the line length of S input and output?} Sweave respects the usual way of specifying the desired line length in S, namely \texttt{options(width)}. E.g., after \texttt{options(width=40)} lines will be formatted to have at most 40 characters (if possible). \subsection{Can I use Sweave for Word files?} Not directly, but SWord provides similar functionality for Microsoft Word on Windows platforms. \subsection{Can I use Sweave for OpenOffice files?} Yes, package \texttt{odfWeave} provides functions for using Sweave in combination with OpenOffice Writer rather than \LaTeX. \subsection{Can I use Sweave for HTML files?} Yes, package \texttt{R2HTML} provides a driver for using Sweave in combination with HTML rather than \LaTeX. \subsection{After loading package \texttt{R2HTML} Sweave doesn't work properly!} Package \texttt{R2HTML} registers an Sweave driver for HTML files, and after that the Syntax for HTML is in the search list before the default syntax. \begin{verbatim} options(SweaveSyntax="SweaveSyntaxNoweb") \end{verbatim} or calling Sweave like \begin{verbatim} Sweave(..., syntax="SweaveSyntaxNoweb") \end{verbatim} ensures the default syntax even after loading \texttt{R2HTML}. \subsection{Why does Sweave delete all comments from the R code? Why does it mess up line breaks for continuation lines?} Sweave runs all code through the R parser. The ``input lines'' you see are the result from running the code through \texttt{parse()} and \texttt{deparse()}, which by default discards all comments and reformats line breaks. If you want to keep the code as it is in the source file, use \begin{verbatim} \SweaveOpts{keep.source=TRUE} \end{verbatim} \end{document} %%% Local Variables: %%% mode: latex %%% TeX-master: t %%% End: