\documentclass[a4paper]{article}

%\VignetteIndexEntry{Sweave User Manual}
%\VignettePackage{utils}
%\VignetteDepends{tools}
%\VignetteDepends{datasets}
%\VignetteDepends{stats}

\title{Sweave User Manual}
\author{Friedrich Leisch}

\usepackage[round]{natbib}
\usepackage{graphicx,Rd}
\usepackage{listings}

\lstset{frame=trbl,basicstyle=\small\tt}

\sloppy

\begin{document}

\maketitle

\section{Introduction}
\label{sec:intro}

Sweave provides a flexible framework for mixing text and R code for
automatic document generation. A single source file contains both
documentation text and R code, which are then \emph{woven} into a
final document containing
\begin{itemize}
 \item the documentation text together with
 \item the R code and/or 
 \item the output of the code (text, graphs) 
\end{itemize}
This allows to re-generate a report if the input data change and
documents the code to reproduce the analysis in the same file that
also contains the report. The R code of the complete analysis is
embedded into a \LaTeX{} document\footnote{\url{http://www.ctan.org}}
using the noweb syntax \citep{flm:Ramsey:1998} which is usually used
for literate programming \cite{fla:Knuth:1984}.  Hence, the full power
of \LaTeX{} (for high-quality typesetting) and R (for data analysis)
can be used simultaneously. See \cite{e1071-papers:Leisch:2002} and
references therein for more general thoughts on dynamic report
generation and pointers to other systems.

Sweave uses a modular concept using different drivers for the actual
translations. Obviously different drivers are needed for different
text markup languages (\LaTeX{}, HTML, \ldots). Several packages on
CRAN provide support for other word processing systems.


\section{Noweb files}
\label{sec:noweb}

Noweb \citep{flm:Ramsey:1998} is a simple literate-programming tool
which allows to combine program source code and the corresponding
documentation into a single file. Different programs allow to extract
documentation and/or source code. A noweb file is a simple text file
which consists of a sequence of code and documentation segments, these
segments are called \emph{chunks}:
\begin{description}
 \item[Documentation chunks] start with a
  line that has an at sign (\verb|@|) as first character, followed by a space
  or newline character. The rest of this line is a comment and ignored. 
  Typically documentation chunks will contain text in a
  markup language like \LaTeX{}.
  
 \item[Code chunks] start with \verb|<<name>>=| at the
  beginning of a line; again the rest of the line is a comment and
  ignored.
\end{description}
The default for the first chunk is documentation.

In the simplest usage of noweb, the (optional) names of code chunks
give the name of source code files, and the tool \texttt{notangle} can
be used to extract the code chunk from the noweb file. Multiple code
chunks can have the same name, the corresponding code chunks are the
concatenated when the source code is extracted. Noweb has some
additional mechanisms to cross-reference code chunks (the
\verb|[[...]]| operator, etc.), Sweave does currently not use or
support this features, hence they are not described here.


\section{Sweave files}
\label{sec:sweavefile}

\subsection{A simple example}

Sweave source files are regular noweb files with some additional
syntax that allows some additional control over the final
output. Traditional noweb files have the extension \texttt{.nw},
which is also fine for Sweave files (and fully supported by the
software). Additionally, Sweave currently recognizes files with
extensions \texttt{.rnw}, \texttt{.Rnw}, \texttt{.snw} and
\texttt{.Snw} to directly indicate a noweb file with Sweave
extensions. We will use \texttt{.Rnw} throughout this document.
 
A minimal Sweave file is shown in Figure~\ref{fig:ex1.Rnw}, which
contains two code chunks embedded in a simple \LaTeX{}
document. Running 
<<>>=
rnwfile <- system.file("Sweave", "example-1.Rnw", package="utils")
Sweave(rnwfile)
@ 
translates this into the \LaTeX{} document shown in
Figures~\ref{fig:ex1.tex} and~\ref{fig:ex1.pdf}. The latter can also
be created directly from within R using
<<>>=
library("tools")
texi2dvi("example-1.tex", pdf=TRUE)
@ 

The first difference between \texttt{example-1.Rnw} and
\texttt{example-1.tex} is that the \LaTeX{} style file
\texttt{Sweave.sty} is automatically loaded, which provides
environments for typesetting R input and output (the \LaTeX{}
environments \texttt{Sinput} and \texttt{Soutput}). Otherwise, the
documentation chunks are copied without any modification from
\texttt{example-1.Rnw} to \texttt{example-1.tex}.

\begin{figure}[htbp]
  \centering
  \begin{minipage}{0.9\textwidth}
    \lstinputlisting{\Sexpr{rnwfile}}
  \end{minipage}
  \caption{A minimal Sweave file: \texttt{example-1.Rnw}.}
  \label{fig:ex1.Rnw}
\end{figure}

The real work of Sweave is done on the code chunks: The first code
chunk has no name, hence the default behavior of Sweave is used, which
transfers both the R commands and their respective output to the \LaTeX{}
file, embedded in \texttt{Sinput} and \texttt{Soutput} environments,
respectively.

The second code chunk shows one of the Sweave extension
to the noweb syntax: Code chunk names can be used to pass options to
Sweave which control the final output.
\begin{itemize}
 \item The chunk is marked as a figure chunk (\texttt{fig=TRUE}) such
that Sweave creates a PDF file corresponding to the plot
created by the commands in the chunk. Furthermore, a
\verb|\includegraphics{example-1-002}| statement is inserted into the
\LaTeX{} file (details on the choice of filenames for figures follow
later in this manual).
 \item Option \texttt{echo=FALSE} indicates that the R
  input should not be included in the final document (no \texttt{Sinput}
  environment). 
\end{itemize}

\begin{figure}[htbp]
  \centering
  \begin{minipage}{0.9\textwidth}
    \lstinputlisting{example-1.tex}
  \end{minipage}
  \caption{The output of \texttt{Sweave("example-1.Rnw")} is the file
    \texttt{example-1.tex}.}
  \label{fig:ex1.tex}
\end{figure}

\begin{figure}[htbp]
  \centering
  \fbox{\begin{minipage}{0.8\textwidth}
    \includegraphics[width=\textwidth]{example-1}
  \end{minipage}}
  \caption{The final document is created by running \texttt{latex} on
  \texttt{example-1.tex}.} 
  \label{fig:ex1.pdf}
\end{figure}


\subsection{Sweave options}

Options control how code chunks and their output (text, figures) are
transfered from the \texttt{.Rnw} file to the \texttt{.tex} file. All
options have the form \texttt{key=value}, where \texttt{value} can be
a number, string or logical value. Several options can be specified at
once (seperated by commas), all options must take a value (which must
not contain a comma or equal sign). Logical options can take the
values \texttt{true}, \texttt{false}, \texttt{t}, \texttt{f} and the
respective uppercase versions.

In the \texttt{.Rnw} file options can be specified either
\begin{enumerate}
 \item inside the angle brackets at the beginning of a code chunk,
  modifying the behaviour \emph{only for this chunk}, or
 \item anywhere in a documentation chunk using the command %
  \begin{quote}
    \verb|\SweaveOpts{opt1=value1, opt2=value2, ..., optN=valueN}|
  \end{quote}
  which modifies the defaults for the rest of the document, i.e.,
  \emph{all code chunks after the statement}. Hence, an
  \verb|\SweaveOpts| statement in the preamble of the document sets
  defaults for all code chunks.
\end{enumerate}

Which options are supported depends on the driver in use. All drivers
should at least support the following options (all options appear
together with their default value, if any):
\begin{description}
  \item[split=FALSE:] a logical value. If \texttt{TRUE}, then the output is
   distributed over several files, if \texttt{FALSE} all output is
   written to a single file. Details depend on the driver.
  \item[label:] a text label for the code chunk, which is used for
   filename creation when \texttt{split=TRUE}.
\end{description}

The first (and only the first) option in a code chunk name can be
optionally without a name, then it is taken to be a label. I.e.,
starting a code chunk with
\begin{quote}
  \verb|<<hello, split=FALSE>>|
\end{quote}
is the same as
\begin{quote}
  \verb|<<split=FALSE, label=hello>>|
\end{quote}
but
\begin{quote}
  \verb|<<split=FALSE, hello>>|
\end{quote}
gives a syntax error. Having an unnamed first argument for labels is
needed for noweb compatibility. If only \verb|\SweaveOpts| is used
for setting options, then Sweave files can be written to be fully
compatible with noweb (as only filenames appear in code chunk names).

\subsection{Using scalars in text}

There is limited support for using the values of R objects in
text chunks. Any occurrence of \verb|\Sexpr{|\texttt{\textit{expr}}\verb|}|
is replaced by the
string resulting from coercing the value of the expression \texttt{expr}
to a character vector; only the first element of this vector is
used. E.g., \verb|\Sexpr{sqrt(9)}| will be replaced
by the string \texttt{'3'} (without any quotes).

The expression is evaluated in the same environment as the code
chunks, hence one can access all objects defined in the code chunks
which have appeared before the expression and were not ignored. The
expression may contain any valid R code, only curly brackets are not
allowed. This is not really a limitation, because more complicated
computations can be easily done in a hidden code chunk and the result
then be used inside a \verb|\Sexpr|.

\subsection{Code chunk reuse}

Named code chunks can be reused in other code chunks following later
in the document. Consider the simple example
\begin{quote}
\begin{verbatim}
 <<a>>=
 x <- 10
 @

 <<b>>=
 x + y
 @

 <<c>>=
 <<a>>
 y <- 20
 <<b>>
@
\end{verbatim}
\end{quote}
which is equivalent to defining the last code chunk as 
\begin{quote}
\begin{verbatim}
 <<c>>=
 x <- 10
 y <- 20
 x + y
 @
\end{verbatim}
\end{quote}

The chunk reference operator \verb|<<>>| takes only the name of the
chunk as argument, without any additional Sweave options.

\subsection{Syntax definition}

So far we have only talked about Sweave files using noweb syntax
(which is the default). However, Sweave allows the user to redefine
the syntax marking documentation and code chunks, using scalars in
text or reuse code chunks. 

\begin{figure}[htbp]
  \centering
  \begin{minipage}{0.9\textwidth}
    \lstinputlisting{example-1.Stex}
  \end{minipage}
  \caption{An Sweave file using \LaTeX{} syntax: \texttt{example-1.Stex}.}
  \label{fig:ex1.Stex}
\end{figure}

Figure~\ref{fig:ex1.Stex} shows the example from
Figure~\ref{fig:ex1.Rnw} using the \texttt{SweaveSyntaxLatex}
definition. It can be created using
<<>>=
SweaveSyntConv(rnwfile, SweaveSyntaxLatex)
@ 

Code chunks are now enclosed in \texttt{Scode}
environments, code chunk reuse is performed using
\verb|\Scoderef{chunkname}|. All other operators are the same as in
the nowb-style syntax.

Which syntax is used for a document is determined by the extension of
the input file, files with extension \texttt{.Rtex} or \texttt{.Stex}
are assumed to follow the \LaTeX-style syntax. Alternatively the
syntax can be changed at any point within the document using the
commands
\begin{quote}
  \verb|\SweaveSyntax{SweaveSyntaxLatex}|  
\end{quote}
or
\begin{quote}
  \verb|\SweaveSyntax{SweaveSyntaxNoweb}|
\end{quote}
 at the beginning
of a line within a documentation chunk. Syntax definitions are simply
lists of regular expression for several Sweave commands, see the two
default definitions mentioned above for examples (more detailed
intructions will follow once the API has stabilized).


\section{Tangling and weaving}

The user frontends of the Sweave system are the two R functions
\texttt{Stangle()} and \texttt{Sweave()}, both are contained in
package \texttt{utils}.  \texttt{Stangle} can be used to extract only
the code chunks from an \texttt{.Rnw} file and write to one or several
files.  \texttt{Sweave()} runs the code chunks through R and replaces
them with the respective input and/or output.  \texttt{Stangle} is
actually just a wrapper function for Sweave, which uses a tangling
instead of a weaving driver by default. See
<<eval=FALSE>>=
help("Sweave")
@ 
for more details and arguments of the functions.

\subsection{The \texttt{RweaveLatex} driver}

This driver transforms \texttt{.Rnw} files with \LaTeX{} documentation
chunks and R code chunks to proper \LaTeX{} files (for typesetting both
with standard \texttt{latex} or \texttt{pdflatex}), see
<<eval=FALSE>>=
help("RweaveLatex")
@ 
for details.

\subsubsection{Writing to separate files}

If \texttt{split} is set to \texttt{TRUE}, then all text corresponding
to code chunks (the \texttt{Sinput} and \texttt{Soutput} environments)
is written to seperate files. The filenames are of form
\texttt{prefix.string-label.tex}, if several code chunks have the same
label, their outputs are concatenated. If a code chunk has no label,
then the number of the chunk is used instead. The same naming scheme
applies to figures.

\subsubsection{\LaTeX{} style file and figure sizes} 

The driver automatically inserts a \verb|\usepackage{Sweave.sty}|
command as last line before the \verb|\begin{document}| statement of
the final \LaTeX{} file if no \verb|\usepackage{Sweave}| is found in the
Sweave source file. This style file defines the environments
\texttt{Sinput} and \texttt{Soutput} for typesetting code chunks. If
you do not want to include the standard style file, e.g., because you
have your own definitions for Sinput and Soutput environemts in a
different place, simply insert a comment like
\begin{verbatim}
% \usepackage{Sweave}
\end{verbatim}
in the preamble of your latex file, this will prevent automatic
insertion of the line.

\verb|Sweave.sty| also sets the default \emph{\LaTeX{}} figure width
(which is independent of the size of the generated EPS or PDF files).
The current default is
\begin{verbatim}
\setkeys{Gin}{width=0.8\textwidth}
\end{verbatim}
if you want to use another width for the figures that are automatically
generated and included by Sweave, simply add a line similar to the
one above \emph{after} \verb|\begin{document}|. If you want no default
width for figures insert a \verb|\usepackage[nogin]{Sweave}| in
the header of your file.
Note that a new graphics device is opened for each figure chunk
(option \texttt{fig=TRUE}), hence all graphical parameters of the
\texttt{par()} command must be set in each single figure chunk and are
forgotten after the respective chunk (because the device is closed
when leaving the chunk).

Attention: One thing that gets easily confused are the width/height
parameters of the R graphics devices and the corresponding arguments
to the \LaTeX{} \verb|\includegraphics| command. The Sweave options
\texttt{width} and \texttt{height} are passed to the R graphics
devices, and hence affect the default size of the produced EPS and PDF
files. They do not affect the size of figures in the document, by
default they will always be 80\% of the current text width. Use
\verb|\setkeys{Gin}| to modify figure sizes or use explicit
\verb|\includegraphics| commands in combination with Sweave option
\texttt{include=FALSE}.

\subsubsection{Prompts and text width}

By default the driver gets the prompts used for input lines
and continuation lines from R's \texttt{options()} settings. To set new
prompts use somthing like
\begin{verbatim}
options(prompt="MyR> ", continue="...")
\end{verbatim}
see \texttt{help(options)} for details. Similarly the text width is
controlled by option \texttt{"width"}.

\subsection{The \texttt{Rtangle} driver}

This driver can be used to extract R code chunks from a \texttt{.Rnw}
file. Code chunks can either be written to one large file or seperate
files (one for each label). The options \texttt{split},
\texttt{prefix}, and \texttt{prefix.string} have the same defaults and
interpretation as for the \texttt{RweaveLatex} driver. Use the
standard noweb command line tool \texttt{notangle} if other chunks
than R code should be extracted. See
<<eval=FALSE>>=
help("Rtangle")
@ 
for details.

\bibliographystyle{plainnat}
\bibliography{Sweave}

\newpage


\appendix


\section{Frequently Asked Questions}
\label{sec:faq}


% \subsection{Where can I find the manual and other information on
%    Sweave?}
 
%   The newest version of the Sweave manual can always be found at the
%   Sweave homepage 
%   \begin{quote}
%     \url{http://www.stat.uni-muenchen.de/~leisch/Sweave}
%   \end{quote}
%   where you also find several example files, and the lisp and shell
%   code snippets of the FAQ. In addition, the homepage has several
%   papers on Sweave like the CompStat paper and the 2-part miniseries
%   from R News (Issues 2/3 and 2/3).
  
  
  \subsection{How can I get Emacs to automatically recognize files
    in Sweave format?}

  Recent versions of ESS (Emacs speaks statistics,
  \url{http://ess.R-project.org}) automatically recognize files with
  extension \texttt{.Rnw} as Sweave files and turn on the correct
  modes. Please follow the instructions on the ESS homepage on how to
  install ESS on your computer.
  
  \subsection{Can I run Sweave directly from a shell?}

   E.g., for writing makefiles it can be useful to run Sweave directly
   from a shell rather than manually start R and then run Sweave. This
   can easily be done using
\begin{verbatim}
R CMD Sweave file.Rnw
\end{verbatim}

   % A more elaborate solution which also includes automatically running
   % \texttt{latex} has been written by Gregor Gorjanc and is available
   % at \url{http://www.bfro.uni-lj.si/MR/ggorjan/software/shell/Sweave.sh}.

  \subsection{Why does \LaTeX{} not find my EPS and PDF graphic files when
     the filename contains a dot?}
  
   Sweave uses the standard \LaTeX{} package \texttt{graphicx} to handle
   graphic files, which automatically uses EPS files for standard
   \LaTeX{} and PDF files for PDF\LaTeX{}, if the name of the input file
   has no extension, i.e., contains no dots. Hence, you may run into
   trouble with graphics handling if the name of your Sweave file
   contains extra dots: \file{foo.Rnw} is OK, while \file{foo.bar.Rnw}
   is not.

  % \subsection{Why does Sweave by default create both EPS and PDF
  %    graphic files?}
   
  %  The \LaTeX{} package \texttt{graphicx} needs EPS files for plain
  %  \LaTeX{}, but PDF files for PDF\LaTeX{} (the latter can also handle
  %  PNG and JPEG files). Sweave automatically creates graphics in EPS
  %  and PDF format, such that the user can freely run \texttt{latex} or
  %  \texttt{pdflatex} on the final document as needed.

   \subsection{Empty figure chunks give \LaTeX{} errors.}

   When a code chunk with \texttt{fig=true} does not call any plotting
   functions invalid EPS and PDF files are created. Sweave cannot know
   if the code in a figure chunk actually plotted something or not, so
   it will try to include the graphics, which is bound to fail.
   
  \subsection{Why do R lattice graphics not work?}
   
   The commands in package \texttt{lattice} have different behavior
   than the standard plot commands in the \texttt{base} package:
   lattice commands return an object of class \texttt{"trellis"}, the
   actual plotting is performed by the \texttt{print} method for the
   class. Encapsulating calls to lattice functions in \texttt{print()}
   statements should do the trick, e.g.:
\begin{verbatim}
 <<fig=TRUE>>=
 library(lattice)
 print(bwplot(1:10))
 @
\end{verbatim}
   should work. Future versions of Sweave may have more automated means
   to deal with trellis graphics.


   \subsection{How can I get Black \& White lattice graphics?}

   What is the most elegant way to specify that strip panels are to have 
   transparent backgrounds and graphs are to be in black and white when 
   lattice is being used with Sweave?  I would prefer a global option that 
   stays in effect for multiple plots.

   Answer by Deepayan Sarkar: I'd do something like this as part of
   the initialization:
\begin{verbatim}   
 <<...>>
 library(lattice)
 ltheme <- canonical.theme(color = FALSE)     ## in-built B&W theme
 ltheme$strip.background$col <- "transparent" ## change strip bg
 lattice.options(default.theme = ltheme)      ## set as default
 @
\end{verbatim}
   
  \subsection{Creating several figures from one figure chunk does
     not work}

   Consider that you want to create several graphs in a loop similar
   to
\begin{verbatim}
 <<fig=TRUE>>
 for (i in 1:4) plot(rnorm(100)+i)
 @
\end{verbatim}
   This will currently \textbf{not} work, because Sweave allows
   \textbf{only one graph} per figure chunk. The simple reason is that
   Sweave opens a postscript device before executing the code and
   closes it afterwards. If you need to plot in a loop, you have to
   program it along the lines of
\begin{verbatim}
 <<results=tex,echo=FALSE>>=
 for(i in 1:4){
    file=paste("myfile", i, ".eps", sep="")
    postscript(file=file, paper="special", width=6, height=6)
    plot(rnorm(100)+i)
    dev.off()
    cat("\\includegraphics{", file, "}\n\n", sep="")
 }
 @ 
\end{verbatim}


 \subsection{How can I set default \texttt{par()} settings for figure
    chunks?}

  Because each EPS and PDF file opens a new device, using \texttt{par()}
  has only an effect if it is used inside a figure chunk. If you want
  to use the same settings for a series of figures, it is easier to
  use a hook function than repeating the same \texttt{par()} statement
  in each figure chunk.

  The effect of
\begin{verbatim}
  options(SweaveHooks=list(fig=function() par(bg="red", fg="blue")))
\end{verbatim}
  should be easy to spot. Do not forget to remove the hook at the end
  of the Sweave file unless you want to use it as a global option for
  all Sweave files.

  % \subsection{Running \texttt{latex} fails on Windows}
   
  %  If you can create the \file{.tex} file by running
  %  \texttt{Sweave()} in R, but cannot convert the \file{.tex} file
  %  to \file{.dvi} or \file{.pdf}, this is most likely caused by a
  %  space in the path of your R installation.  If the path of your R
  %  installation contains any blank characters (like the default
  %  \verb|"c:\Program Files\..."| in English versions of Windows), this
  %  may cause problems, because programs like \texttt{tex} or
  %  \texttt{latex} cannot handle blanks in paths properly.
   
  %  Two possible solutions:
  %  \begin{enumerate}
  %   \item Install R in a path not containing any blanks. 
  %   \item Copy the file \file{Sweave.sty} to a directory in your tex
  %    path or the directory containing the Sweave file and put a
  %    \verb|\usepackage{Sweave}| into the preamble of your Sweave file.
  %  \end{enumerate}

  \subsection{How can I change the formatting of S input and output
     chunks?}
   
   Sweave uses the \texttt{fancyvrb} package for formatting all S code
   and text output. \texttt{fancyvrb} is a very powerful and flexible
   package that allows fine control for layouting text in verbatim
   environments. If you want to change the default layout, simply read
   the \texttt{fancyvrb} documentation and modify the definitions of
   the \texttt{Sinput} and \texttt{Soutput} environments in
   \file{Sweave.sty}, respectively.


  \subsection{How can I change the line length of S input and
     output?}
   
   Sweave respects the usual way of specifying the desired line length
   in S, namely \texttt{options(width)}. E.g., after
   \texttt{options(width=40)} lines will be formatted to have at most 40
   characters (if possible).


  \subsection{Can I use Sweave for Word files?}

  Not directly, but SWord provides similar functionality for Microsoft
  Word on Windows platforms.
   
  \subsection{Can I use Sweave for OpenOffice files?}

   Yes, package \texttt{odfWeave} provides functions for using Sweave in
   combination with OpenOffice Writer rather than \LaTeX.
   
   \subsection{Can I use Sweave for HTML files?}

   Yes, package \texttt{R2HTML} provides a driver for using Sweave in
   combination with HTML rather than \LaTeX.
   
  \subsection{After loading package \texttt{R2HTML} Sweave doesn't
     work properly!}
   
   Package \texttt{R2HTML} registers an Sweave driver for HTML
   files, and after that the Syntax for HTML is in the search list
   before the default syntax.
\begin{verbatim}
  options(SweaveSyntax="SweaveSyntaxNoweb")   
\end{verbatim}
or calling Sweave like
\begin{verbatim}
  Sweave(..., syntax="SweaveSyntaxNoweb")
\end{verbatim}
ensures the default syntax even after loading \texttt{R2HTML}.


\subsection{Why does Sweave delete all comments from the R code? Why
  does it mess up line breaks for continuation lines?}

Sweave runs all code through the R parser. The ``input lines'' you see
are the result from running the code through \texttt{parse()} and
\texttt{deparse()}, which by default discards all comments and
reformats line breaks. If you want to keep the code as it is in the
source file, use
\begin{verbatim}
  \SweaveOpts{keep.source=TRUE}
\end{verbatim}


\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: