Frequently Asked Questions about S. $Revision: 1.24 $
This document contains answers to some of the most frequently asked
questions in the S-news
mailing list and sci.stat.math
newsgroup about S and S-PLUS. They're all good questions, but they come
up often enough that substantial net bandwidth can be saved by looking
here first before asking.
MathSoft now has their own Web site which provides useful
material which is kept more current than the information provided here.
The URL for this site is http://www.mathsoft.com/
.
If you are using a Web browser, you may access this site by clicking
here.
This list is currently available in ASCII, TeXInfo, and HTML versions.
These are maintained by Charles Roosen
<charles@playfair.stanford.edu>
with the gracious assistance of
Martin Maechler and many subscribers to the S-news
mailing list.
/S/faq
on statlib
.
For information on accessing statlib
,
See section What is the statlib
server? How can I access it?.
/S/faq.texinfo
on statlib
.
http://www.stat.math.ethz.ch/S-FAQ
.
The primary document is the TeXInfo file. The HTML version
is automatically produced from the TeXInfo version using texi2html
.
The ASCII version is produced from the TeXInfo using makeinfo
.
These documents are also available by anonymous ftp from
ftp.stat.math.ethz.ch
in directory /pub/Doc
. Note that
unless we develop a script to automatically update the files at
statlib
, the versions at ftp.stat.math.ethz.ch
are
more current than those at statlib
.
If you have any suggested additions or corrections, please send them to
Charles Roosen <charles@playfair.stanford.edu>
with a subject
line of the form S-faq Suggestion: [Insert Topic Here]
.
Special thanks to Rick Becker and Brian Ripley for their patience in
looking over and making suggestions and corrections to the many attempts
that led to this faq. Bill Dunlap and Pat Burns were kind enough to go
through and suggest modifications. Thanks are also due to the members
of the S-news
newsgroup, too many to mention, who contributed
suggestions and questions for this FAQ. Portions of this faq were
shamelessly lifted from a similar document provided by Rick Becker and
previous answers on this net.
Some general questions, though stated in terms of S only, are applicable to both S and S-PLUS.
S-news
mailing list.
Requests to be added or dropped from the S-news
list should be
sent to the electronic mail address
s-news-request@utstat.toronto.edu
. Sending a one line message saying
either subscribe
or unsubscribe
will suffice.
Please do not send mail to s-news@utstat.toronto.edu
since any mail sent to this address is automatically forwarded to
thousands of other users.
Your local site may have many subscribers to S-news
. In that
case, it may have a local mailing list, so that a single message from
utstat.toronto.edu
is sent to the site and propagated to the
local list. In this case please contact your system administrator to be
added to or dropped from the local S-news
list.
Due to the existence of these local lists, S-news-request
is
administrated manually. Hence subscribe and unsubscribe requests are
not fulfilled instantaneously. They will be handled in a timely manner,
so please be patient.
Before you ask a question of the mailing list check with your local
S expert. If you don't have any luck there, read through this FAQ.
If you still don't have an answer to your question go ahead and send
mail to s-news@utstat.toronto.edu
.
When asking questions please specify (if relevant):
S VERSION
or Splus
VERSION
.
uname -a
for Unix
enthusiasts).
When answers are sent to you individually and not to the mailing list, it is considered good etiquette to summarize the answers and mail them to the newsgroup.
Try to make your answer broad enough that people other than the original poster may benefit from it. If you consider your answer to have broad interest, you may want to post it to the newsgroup instead of replying directly to the individual who asked. In this case, please make sure that your answer is not a duplication of a previous answer. Because of the manner in which the newsgroup is distributed, messages arrive in different orders at different sites, so don't assume that the message you are answering will arrive before yours. Try to summarize the essential point of the question before your reply, but don't feel obliged to quote the whole question.
S is a very high level language and an environment for data analysis and graphics. S was written by Richard A. Becker, John M. Chambers, and Allan R. Wilks of AT&T Bell Laboratories Statistics Research Department. More recently, other Bell Labs researchers have made major contributions to a new modeling capability in S. The S language is the form in which S users express their computations. The environment provides facilities for data management, support for many graphics devices, etc. S is useful for computation in a wide range of applications. It's a very general tool, so that applications are not restricted to any particular subject area. One way to think of it is to imagine the wide range of applications that can be handled by a spreadsheet program--but think of an even broader range of applications because S is much more flexible for complex computations. As examples, S has been used for computing in business, finance, experimental science, etc. The authors of S prefer that you not call S a statistics package. Most of the people who use S have no attachment to statistics, and most of the S applications involve basic quantitative computations and graphics. "Package" really doesn't apply well to S, either. The word "package" often connotes a collection of unrelated tools put together under one name; this is unlike S, where all functions are tightly integrated and controlled by the S language.
The current S version is the April, 1992 version. A new release is currently being Beta tested and will be released commercially in 1998. This version (Version 4) will incorporate significant changes to the language, yet will be backwards compatible with the existing version of S.
S-PLUS is a value-added version of S sold by MathSoft, Inc. S-PLUS is a fully supported and documented application which has been compiled and tested on numerous architectures. It is available in both UNIX and Windows versions.
S is a subset of S-PLUS, and hence anything which may be done in S may be done in S-PLUS. In addition S-PLUS has extended functionality in a wide variety areas, including robust regression, modern nonparametric regression, time series, survival analysis, multivariate analysis, classical statistical tests, quality control, and graphics drivers. Add-on modules add additional capabilities for wavelet analysis, spatial statistics, and design of experiments. In addition, S-PLUS 4.0 for Windows introduces a full-featured graphical user interface with tremendous functionality.
The current version of S-PLUS is Version 3.4 on UNIX and 4.0 on Windows. For a complete table of operating systems for which S-PLUS is available see See section What machines does S/S-PLUS run on?.
The book, "The New S Language", was published in 1988 and introduced the modern version of S that is known today. The version of S prior to that (described in "S: An Interactive Environment for Data Analysis and Graphics") is now called old S and is defunct.
The source code for S is licensed by AT&T Bell Laboratories, but is distributed exclusively by StatSci. For information on contacting StatSci, See section How do I get S-PLUS?.
For each release of S, there is only one version of the source code--it contains instructions for compiling on a variety of platforms. For more information, See section Should I get S in source form or binary?.
If you really want to read (and/or modify) the C and Fortran code that makes up S, then you need a source license. For most people who want to run S, however, a binary version of S is more convenient. It has already been compiled and specialized to a particular machine and thus it can be installed very easily. It may support specialized graphics devices for that machine. It will also be much less expensive, both in initial cost and because you don't need to purchase C and Fortran compilers in order to process the source code.
S is written primarily in the C language, but it also makes use of Fortran subroutines to carry out various numerical algorithms. Thus, in order to compile the S source code, you need both C and Fortran compilers, (and the compilers must be compatible with one another so that C programs can call Fortran programs, and vice versa.) It may be possible to use a Fortran-to-C translator rather than a compatible Fortran compiler.
At one point in time a binary version of S known as SUCCESS was available for some machines. Currently S is only available in binary from StatSci as a component of S-PLUS.
For more information, See section What software do I need to go along with S?, See section How does one install S? How long does it take?, See section How much disk space is necessary to install S?, See section How much disk space is necessary to install S-PLUS?, See section What is the best machine for S?, and See section What operating system does S need?.
There are various prices for S, depending on whether you get it in source form or in binary form. The binary version price also depends on the particular machine it is targeted for. There is also an educational price schedule and different prices in different countries. To get current prices appropriate to your situation, talk to a sales representative.
For more information, See section How do I get S?.
Prices for S-PLUS, too, cannot be specified because of country dependence and discounts for non-profit and academic users.
For more info, See section How do I get S-PLUS?.
An electronic source for information on S-PLUS is the MathSoft Web site.
The URL for this site is http://www.mathsoft.com/
.
If you are using a Web browser, you may access this site by clicking
here.
You can get S-PLUS in North America from:
MathSoft 1700 Westlake Ave North Suite 500 Seattle, WA 98109 (800) 569-0123 toll free in the United States and Canada (206) 283-8802 (206) 283-8691 fax E-Mail: mktg@statsci.com
Sales and support outside of North America is provided through distributors. Contact information for local distributors may be obtained from MathSoft at:
MathSoft Knightway House Park Street Bagshot GU19 5AQ England +44 276 452299 +44 276 451224 shelp@mathsoft.co.uk
No, MathSoft does not supply source code for S-PLUS.
At one point in time a binary version of S known as SUCCESS was available for some machines. However, this product is no longer produced. MathSoft is the exclusive U.S. distributor of S and S-PLUS. There are local distributors in a number of other countries. MathSoft in Seattle will forward requests to the relevant distributor.
If you run a binary version of S, such as S-PLUS, then you will
be supplied with all the software necessary for using the system.
You may find it useful to have C or Fortran compilers in order
to use the .C()
and .Fortran()
functions in S--these
allow you to add your own compiled algorithms to S.
For more info, See section Should I get S in source form or binary?, and See section Dynamic Loading in S.
In general, the person installing S on a machine should be familiar with the machine and its operating system, and should have some moderate level of computer sophistication. It requires much more training to compile and install S than to use it. If you fear this may be more than you are up to, then you should probably think of getting a binary version of S that is ready to install on your machine.
The installation of S is run under control of a program, and when there are no difficulties, it runs quickly and without intervention. On certain machines, S has been installed, including all compilations, in under an hour (about 15 minutes to set everything up and then wait for the compiles). Of course, that is when everything goes right. There is no known upper bound on the time to install.
For more information, See section Should I get S in source form or binary?.
You should have approximately 40Mb of free disk space before installing the S source code and attempting to compile it. After the compilation process, you can remove many files, producing an executable version of S that occupies approximately 10--15Mb of disk (the precise number depends on the machine and operating system).
S-PLUS is distributed in binary form and installation is quick, a few minutes (the exact time depends on the local environment) adapting to the local environment.
MathSoft recommends 45 MB of storage space for UNIX and 80 MB for Windows. (A lesser amount of disk space can be used to install a partial version of S-PLUS.) In addition, 48 MB or more of swap space is recommended for the UNIX version, where the swap space actually required is dependent on the size of the data sets analyzed.
S runs on a wide range of computers, from powerful personal computers to large mainframes. Most people use S on "professional workstations" such as those manufactured by DEC, Hewlett Packard, Silicon Graphics, Sun Microsystems, and others. High-end personal machines, notably machines based on the Intel 80486 or Pentium architectures are also reasonable candidates for running S. (For more information, See section What operating system does S need?.)
S-PLUS is available on a variety of UNIX and Windows computers. Please refer to the table below to see platform, hardware requirements, OS requirements and expected upgrade dates. All current S-PLUS versions are based on the AT&T S version dated May, '92.
Platform OS S-PLUS release req'd number --------------------------------------------- SPARC and SPARC SUNOS 4.1.3/4.1.4 3.4 compatibles Solaris 2.3/2.4/2.5 DECstation Ultrix 4.4 3.4 DEC Alpha OSF 3.2 3.4 HP 9000-7xx HP-UX 9.x 3.4 and 8xx IBM RS-6000 AIX 3.2.5 3.4 SGI IRIS-4D, IRIX 5.2/5.3/6.0 3.4 Indigo, IRIS compatibles S-PLUS for WINDOWS Windows 3.1, 4.0 Windows 3.1.1, Windows 95, Windows NT ---------------------------------------------
RAM MEMORY REQUIRED: UNIX 12 MB Windows 32 MB HARD DISK SPACE REQUIRED: Swap Space UNIX 48 MB or more swap space recommended. Swap space actually required is dependent on size of data analyzed. Hard Disk Storage Space: UNIX 45 MB Windows 80 MB
In all cases, more memory will result in improved S-PLUS performance.
If the machine you are interested in running S-PLUS on is not on the list above contact the MathSoft Sales Department at 800-569-0123 or at mktg@statsci.com for more information.
That's particularly difficult to answer and even if we came up with an answer today, it would likely be different tomorrow. Because S operates on a wide range of machines that run the Unix operating system, and because the newest, most powerful and cost-effective workstations normally run Unix, S generally operates on the "best" machines at any point in time.
S was designed to work with the Unix operating system; it works with the System V, Berkeley and Research versions. Several organizations have made S work with other operating systems, too, including DEC's VMS and Microsoft's Windows 95.
For more information, See section What machines does S/S-PLUS run on?, and See section Will S/S-PLUS run on machine X running OS Y?.
One of the powerful features of S is its unified capability for expressing statistical models. The 1992 book, `Statistical Models in S', is written specifically for people who want to perform statistical computations. It describes how to use S to carry out a wide range of computations for techniques such as linear models, analysis of variance, generalized linear models, generalized additive models, smoothing, tree-based models, and non-linear models.
Simpler and more classical statistical computations can easily
be programmed in S; often you will find someone has already done
the work, e.g. the MASS package available from statlib
,
(See section What is the statlib
server? How can I access it?, for
information on statlib
) and in S-PLUS.
You type an expression to S; S evaluates it and displays the answer. Thus, S works something like a desk calculator. The difference is that S can operate with large collections of data at once, so one expression might produce a graph, fit a line to a set of points, or carry out another complex operation.
S is a language that conforms to a particularly small, uniform set of rules. That means that the S language itself is easy to learn. In fact, most non-programmers find S very natural; programmers occasionally have trouble with S concepts because they are so much more general than those in traditional programming languages. It will take some time, though, to become familiar with the large number of functions supplied with S.
S-PLUS 4.0 for Windows adds a customizable point-and-click interface with statistics and graphics menus and dialogs.
For more info, See section What documentation is available for S, S-PLUS?,
See section What is the statlib
server? How can I access it?, and
See section Are archives of the S-news
digests available?.
The primary references for S are two books by the creators of S.
Two somewhat dated books describing early versions of S are
S-PLUS comes with its own extensive set of manuals. Note that due to nontrivial printing costs the Reference Manuals must currently be purchased separately.
S and S-PLUS both contain online documentation for all of their
functions via the help()
function. In the UNIX version of S-PLUS
the help.start()
function provides a convenient menu-driven help
system, while in the Windows version help is provided through the
Windows Help system.
Two guides to S-PLUS are available from the S directory of
statlib
(See section What is the statlib
server? How can I access it?, for
information on statlib
), both of which contain much material
useful to users of S of August 1991 or later versions.
The index entries are:
ripley@stats.ox.ac.uk
)
venables@stats.adelaide.edu.au
) and
David Smith (D.M.Smith@lancaster.ac.uk
).
Using a Web browser, these notes also
may be obtained
from Lancaster University.
These are also available by anonymous ftp from
markov.stats.ox.ac.uk
[192.76.20.1
]
in directory pub/S (see the file README
for current details) and
on statlib
. (See section What is the statlib
server? How can I access it?, for information on statlib
).
Other books which discuss particular aspects of S and S-Plus include the following.
For readers of Japanese there are
For readers of German there is
statlib
server? How can I access it?
statlib
is a system for distributing statistical software by
electronic mail, ftp, and World Wide Web.
The easiest way to access statlib
is using a Web browser
(e.g. Mosaic) with a URL of http://lib.stat.cmu.edu/
.
If you are using a Web browser, you may access statlib
by
clicking here.
To access the statlib
mail server, send a mail message to
statlib@lib.stat.cmu.edu
. For starters, send a message
containing the following:
send index send index from S
This will give you an index of the general and S-specific
material available on the statlib
server.
Remember that the server does not understand English or any other language. Your requests must be exactly in the form specified.
Anonymous ftp access is also available. Type ftp lib.stat.cmu.edu
At the login prompt, type statlib
(without the quotes)
and give your e-mail address as the password.
A `mirror' of the statlib
archive in the UK is
available at unix.hensa.ac.uk
. For details on the
mail server, send email to netlib@unix.hensa.ac.uk
with a body of send browser
.
The site can also be accessed by telnet (log in as 'archive'),
by anonymous ftp
, or by WWW with the URL
http://www.hensa.ac.uk/
. The statlib
archive is under
/statlib
.
Using a Web browser, you may access this statlib
mirror by clicking
here.
S-news
digests available?
Archives of the S-news
digests are available at statlib
,
in the directory s-news
, and these can be requested by e-mail or
retrieved by ftp. There were 175 digests as of November, 1994.
You can search the digests by keyword. The format of the find command in an e-mail is:
find <digest_number> <keyword>[ <keyword>..] in s-news
For example, to search digest1 for the keywords
`regression' and `transformation', mail the following
to statlib
:
find digest1 regression transformation in s-news
Note that the word "all" in place of <digest_number> will
search all digests. The introductory message from statlib
gives more details.
See section What is the statlib
server? How can I access it?, for
information on statlib
.
S comes with functions designed to read ASCII files. It also has the ability to invoke commands in the operating system and to interface with C and Fortran programs. These can be used to access data kept in other forms, in database management systems, etc.
The scan()
function can be used to read data from a text
file or interactively from standard input. The function
make.fields()
can be used to create fields with a specified field
separator so that the file can be used as input to scan()
.
The read.table()
function reads an ASCII file and creates a
data frame (Refer to the White book (See section What documentation is available for S, S-PLUS?, for more info) for information about data
frames).
S-PLUS 4.0 for Windows allows data import and export from a variety of file
formats such as Excel, SAS, and SPSS. See the File:Import Data:From File
menu item or the import.data()
and export.data()
functions.
The write()
function allows you to write S data into
a file in ASCII format.
The functions print()
, format()
, cat()
and
paste()
can be used to format output to be written on to the
files.
The sink()
function allows you to enter output from S/S-PLUS
commands into a file.
The data.dump()
, dump()
and dput()
functions write
S objects into ASCII files but not in regular text format. They are
used for data transfer between machines.
For more info, See section Can S/S-PLUS objects be transferred from one machine to another?, and See section When is dump()
and restore()
/source()
preferable to data.dump()
/data.restore()
?.
S objects are stored as binary files for efficiency when they are accessed. Because these files contain hardware-dependent information (floating point representations, for example), they should not be moved directly from one machine to another unless you are sure that the underlying machine arithmetic and storage policies are identical.
The portable way to move S objects is to convert them to an ASCII
file using the data.dump()
function; the file can be moved to
the new machine and the objects recreated using data.restore()
.
For more information,See section When is dump()
and restore()
/source()
preferable to data.dump()
/data.restore()
?.
dump()
and restore()
/source()
preferable to data.dump()
/data.restore()
?
The only advantage of dump()
/restore()
over data.dump()
/data.restore()
is that the ASCII file produced is easy to read and change.
Thus dump is often used to produce ASCII files of S functions which
are then edited and redefined using source or restore. However,
for any S data objects that are not small, data.dump
is recommended
since it is faster and uses much less memory.
When dump()
and restore()
were created they were intended
to accommodate
the entire range of S data structures using the same syntax as the
S language did. restore()
parses and then evaluates each dump'ed
object. If you have a matrix containing 10000 numbers, restoration
of the file executes the c()
function with 10000 arguments.
That takes quite a bit of space to parse and evaluate.
data.dump()
and data.restore()
are designed to be used in
the same way as dump()
and restore()
but deal with an
ASCII representation that can be efficiently turned back to S objects.
round()
sometimes not print rounded values?There are two stages in rounding--the first step is producing an internal representation of the rounded value. For example,
> x <- .123450000001 > y <- round(x,3)
uses the machine's floating point arithmetic to produces the
best approximation to the the numeric value .123 in y.
That's all the round()
function does.
The next step is printing this value or incorporating it into a text string. It is at this stage when things can go astray. The S print function, invoked automatically when S objects are printed, tries hard to produce a pretty visual representation of the value being printed.
> x [1] 0.12345 > y [1] 0.123
(options(digits=)
controls how many digits the print function
thinks are important).
Other functions that convert numeric to character, may not produce results as "pretty" as print does:
> as.character(x) [1] "0.123450000001" > as.character(y) [1] "0.123"
Depending on the machine's arithmetic, there may even be instances
where as.character()
(or cat()
or paste()
) will
produce extra digits from a rounded value.
The solution is to use the function format()
, to turn the numeric
value into a "pretty" character value:
> format(x) [1] "0.12345"
This is particularly important in building character strings
> paste("r = ",format(round(x, dig=4))) [1] "r = 0.1234"
Before you write a major function check on statlib to see if
there is something similar to what you need. (See section Where can I get contributed functions?, for information
on obtaining contributed functions and See section What is the statlib
server? How can I access it?, for info on
statlib
). A question to the S-news
list with a brief
description of what you want (See section Asking questions of the mailing list.,) might elicit useful responses.
Some general guidelines for function writing are given below:
Use full names for arguments and function names; args can be abbreviated, so the full name doesn't hurt. There are lots of functions, so a good name is important.
Provide reasonable defaults for arguments.
Read current S code to see (some) examples of good style.
Start simply, get something working immediately and build capabilities gradually and interactively. Try to think of your computation in "whole data" terms. What is it trying to produce as a final result? Don't rush in to write it as a sequential Fortran algorithm.
Use self-checking computations while doing interactive data analysis with S. Try to think of ways of checking your work. For example,
sum(resid) == 0.
The function, browser()
, could be of help in debugging your
functions.
Try to deal with the most general situation if not too ugly. For example, NAs, character data, lists, 0-length args, rather than just numeric vectors. On the other hand, it's easy to get too ugly. It's better to have short simple computation that does 90% of all cases than to try to accommodate all things.
Do appropriate error checks on args if the standard error message is cryptic.
Try to avoid explicit loops if there are suitable primitives available
that can do the job. (Note that some primitives also use
loops, e.g. apply()
. They are, however, likely to be written with more
care than you might be willing to give.)
Be especially careful of building up a vector element by element in loops. When necessary, element by element computations should be done by creating an object and then replacing pieces of it rather than having an object grow by gluing together pieces.
Use comments where appropriate but save blocks of text for the online documentation (You are writing online documentation, right?)
Graphics functions should change as little of the graphics state as possible. This allows the user (or function) that calls the graphics function to achieve its own specialization.
Use on.exit()
to clean up -- graphical parameter changes, removing
temp files, etc.
Contributed functions are available on statlib
.
Check the index on statlib
to find out which functions are
available. (See section What is the statlib
server? How can I access it?, for info on statlib
.
Dynamic loading is implemented by the S function dyn.load
.
(See "The New S Language", Ch. 7, pgs 193-204).
This function will take the object file (typically output
by a C or Fortran compiler) and load it into memory so that
it can be executed by the S functions .C()
or .Fortran()
.
During the loading, dyn.load()
attempts to resolve any references
to other routines. These references can come from explicit
subroutine calls or from implicit calls to library routines.
Unfortunately, the implementation of dyn.load()
is difficult and
dependent on hardware and the operating system, so the AT&T distribution
of S provides it for Vax and Motorola 68000-based architectures.
AT&T does not supply dyn.load
for Sun's Sparc architecture.
MathSoft provides a working version of dyn.load
, dyn.load2
,
and/or dyn.load.shared
for each Unix architecture on which S-PLUS
runs.
For more info, See section What is the statlib
server? How can I access it?.
Static loading is another way of loading subroutines with S. Static loading creates a local version of S in your current working directory. Remember, however, that a copy of S requires about 6Mb of space and must be recreated whenever changes are made to S.
You would use static loading if:
dyn.load
.
dyn.load()
/dyn.load2()
would
be too slow. A particular example is when the loaded code depends
on libraries and dyn.load2()
complains about missing or duplicate
symbols, since dyn.load2
is slow it may be faster in this case to
use static loading instead of repeated calls to dyn.load2
.
Yes, you can use the function call_S
within C to call an S/S-PLUS
function from within a C program. See section 7.2.4 of the blue book,
and SPLUS reference manual.
Note that C code calling S must be linked into the S executable
(via dynamic or static loading).
win.graph()
, and includes a printer driver win.printer()
.
S-PLUS 4.0 for Windows introduces a new Windows graphics driver
graphsheet()
which generates point-and-click editable graphics.
Execute the expression
help(Devices)
S-PLUS 4.0 for Windows provides a wealth of menu and dialog based functionality, including completely extensible and customizable menus and dialogs.
S-PLUS 3.4 for UNIX does not have built-in statistics menus, but does include tools for building menus and dialogs.
Most software to support dynamic graphics is tuned to a particular
output device. Since S provides a device-independent
graphical system, there are no dynamic graphics applications
that are part of S. However, S has been used effectively as a
platform from which device-dependent graphics code can be executed.
In this case, S provides for data management, computations, etc., and
hardware-specific routines are called to produce the dynamic displays.
For users on Silicon Graphics machines, S provides library(brush
) which
implements brushing and point cloud rotation using SGI's gl library.
S-PLUS does have dynamic graphics using the X and sunview window
systems; see its brush()
and spin()
functions.
S-PLUS 4.0 for Windows makes it easy to export graphics to a wide variety of formats through the File:Export Graph menu item.
For S-PLUS 3.4 for UNIX, a summary of comments by Bill Venables, Dave Smith and Brian Ripley follows.
The alternatives are either to produce PostScript directly from
S/S-PLUS, or to go via a graphical representation such as that of fig
(a public domain drawing package).
postscript()
driver, as in
postscript(file="1.eps", height=4, width=5, horiz=T, pointsize=8)If you use
postscript()
directly, remember to call
graphics.off()
(or quit S) after finishing the plot calls. S-PLUS
users can call postscript()
via dev.print()
.
pscript()
driver, which can
be used either directly (with "onefile=F"
and calling
graphics.off()
after use) or via
dev.print(pscript, onefile=F, print=F, ...)
rmv filename
and click on print. [Here rmv
is a shell script with contents
mv $2 $1
.] If this is available, this is the easiest way.
fig()
driver
obtainable from statlib
by send fig from S
. (See section What is the statlib
server? How can I access it?,
for information on statlib
).
To include PostScript in TeX/LaTeX documents you need to consult the
details of your dvi to ps program. Two macro packages, epsf
and
psfig
, make the job much easier. Both are distributed with
Tomas Rokicki's dvips, obtainable from labrea.stanford.edu
in
~ftp/pub
.
In 3/93 the latest version was 5.514. Other versions of epsf
and
psfig
are available for other dvi to ps programs, from a wide
variety of archives. A wide range of PostScript editors
are available, and cognescenti can edit PostScript directly.
Fig-format plots can be edited with xfig
and converted to
Encapsulated PostScript (and a number of other formats) with
fig2dev
. (Both are now version 2.1). They are part of the X11R5
distribution, but can be obtained separately by anonymous ftp from
export.lcs.mit.edu
in the directory /pub/R5untarred/contrib/clients
.
Alan M. Zaslavsky has placed an archive of contributed collections on
statlib
(See section What is the statlib
server? How can I access it?, for information on statlib
) named
postscriptfonts
. A short description of the files is given
below.
Functions to display postscript fonts and, using the postscript()
driver, to add text to a plot (or the margin of a plot) that contains
mixed fonts (including Greek), mixed character sizes, local and
motions (e.g., sub and superscripts).
fontdemo
'
ps.show.fonts
'
mixed.text
'
mixed.mtext
'
mixed.text.vector
'
mixed.mtext.vector
'
ps.preamble.ISO.LATIN
'
postscript(preamble=ps.preamble.ISO.LATIN)
).
Brian Ripley created the following tutorial, which was posted to
s-news
on 7 Aug 92 and is archived in digest81 on statlib
.
This is a series of hints on a memory usage in S, from a real teaching example. We are running S-Plus 3.0 (August 1991 S) on Sun Sparcs; the examples were computed on an IPC with 12Mb ram, 32Mb swap on a local disc.
Brian Ripley ripley@stats.ox.ac.uk
Consider a shoe experiment with 10 boys, an experiment reported in Box, Hunter & Hunter (1977), Statistics for Experimenters. There were two materials (A and B) that were randomly assigned to the left or right shoe:
shoes <- scan(,list(L=0, R=0)) 13.2 14.0 8.2 8.8 11.2 10.9 14.3 14.2 11.8 10.7 6.6 6.4 9.5 9.8 10.8 11.3 9.3 8.8 13.3 13.6 attach(shoes) t.test(L,R, paired=T)
The sample size is rather small, and one might wonder about the validity of the t-distribution. An alternative for a randomized experiment such as this is to base inference on the permutation distribution of d. Computation shows that the agreement is very good, but that computation causes problems in S.
The most obvious way to explore the permutation distribution of the t-test of d = L-R is to select random permutations. The supplied function S-Plus function t.test computes much more than we need, and so is rather slow (about 0.7 secs on a Sun SparcStation SLC). It is simple to write a replacement function to do exactly what we need.
attach(shoes) d <- L-R ttest <- function(x) mean(x)/sqrt(var(x)/length(x)) n <- 100 res <- numeric(n) for(i in 1:n){ x <- d*sign(runif(10)-0.5) res[i] <- ttest(x) print(c(i, memory.size())) }
This took about 70 secs and used an additional 1Mb of memory! Increasing the sample size to 1000 causes seriously antisocial paging activity, and takes about 3 hours.
As the permutation distribution has only 2^10 = 1024 points we can explore it directly:
n <- 1024 perm.res <- numeric(n) for(i in 1:n){ j <- i; x<-d for(k in 1:10) {x[k] <- x[k]*(2*(j%%2)-1); j <- j%/%2} perm.res[i] <- ttest(x) print(c(i, memory.size())) } par(mfrow=c(1,2)) hist(perm.res, 25, probability=T, xlab="diff") x <- seq(-4,4, 0.1) lines(x, dt(x,9)) sres<- c(sort(perm.res), 4) yres<- (0:1024)/1024 plot(sres, yres, type="S", xlab="diff", ylab="") lines(x, pt(x,9), lty=3) legend(-5, 1.05, c("Permutation dsn","t_9 cdf"), lty=c(1,3))
which took about 5 hours and 17Mb of memory.
The problem is that S does not release any memory until the loop is completed, so the memory usage is linear in the size of the loop. It may help to encapsulate the contents of the loop in a function:
n <- 100 res <- numeric(n) test.t <- function(x){ res <- ttest(d*sign(runif(10)-0.5)) print(c(i, memory.size())) res } for(i in 1:n) res[i] <- test.t(x)
but the resources used in this instance are virtually unchanged. We can of course run 10 loops of length 100, provided these are done sequentially and not in a loop:
attach(shoes) d <- L-R ttest <- function(x) mean(x)/sqrt(var(x)/length(x)) test.t <- function(x){ res <- ttest(d*sign(runif(10)-0.5)) print(c(i, memory.size())) res } res <- numeric(1000) for(i in 1:100) res[i] <- test.t(x) for(i in 101:200) res[i] <- test.t(x) for(i in 201:300) res[i] <- test.t(x) for(i in 301:400) res[i] <- test.t(x) for(i in 401:500) res[i] <- test.t(x) for(i in 501:600) res[i] <- test.t(x) for(i in 601:700) res[i] <- test.t(x) for(i in 701:800) res[i] <- test.t(x) for(i in 801:900) res[i] <- test.t(x) for(i in 901:1000) res[i] <- test.t(x)
but there is yet another catch. If this is run from a file with source() the memory is not released after each loop. It is necessary to put these instructions in a file t.test.s and use
Splus < t.test.s
or to cut-and-paste the instructions into a window running S. With an input file, it can be helpful to include
options(echo=T)
to echo the input. This approach can be taken as far as needed, even to listing each step of the loop on the file to minimize memory usage.
For extensive runs you may want to use the BATCH mode of S, run from the Unix command line as
Splus BATCH infile outfile
which runs a background job (at reduced priority 5) taking input from infile and writing output to outfile. (Note that options(echo=T) is set automatically in this mode.)
The corresponding command under Windows is
Splus /BATCH infile outfile
Generating a bootstrap sample in S is very easy:
sample(x, replace=T)
samples with replacement length(x) items from x. The difficulty comes from the memory problems. We can avoid explicit loops by the following device:
n <- 100 A <- matrix(rep(d,n),10,n) res <- apply(A,2,function(x) ttest(sample(x, replace=T))) print(memory.size())
but this is once again subject to memory build-up to store A, which needs 8 x 10 x n bytes of storage, and for the internal for loop in apply . However, we can do the computation in chunks, from the command-line or an input file:
n <- 250 A <- matrix(rep(d,n),10,n) res <- apply(A,2,function(x) ttest(sample(x, replace=T))) res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) ) res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) ) res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) )
which used about 3Mb and took 7 minutes. The alternative approach of a series of four explicit for() loops uses essentially the same memory but took 40 seconds longer.
However they are done, S is not very suitable for long series of simulations of small problems. These are better done by an external computer program or a special subroutine calling a Fortran or C program.
If S must be used, the ultimate approach is to split the problem up into small pieces and to launch a child S process for each part. It helps that assignments are permanent. The basic style is
unix("Splus < infile >> outfile", output.to.S=F)
and as an example, create a file named t.test.s
containing the commands
for (i in 1:100) res1[i] <- ttest(sample(d, replace=T)) res <- c(res, res1)
then within Splus
attach(shoes) d <- L-R res1 <- numeric(100) res <- numeric(0) ttest <- function(x) mean(x)/sqrt(var(x)/length(x)) for(i in 1:10) unix("Splus < t.test.s >> junk", output.to.S=F)
This took about 4Mb (as 2 S processes run) and 9 minutes. Note that loops can safely be used here, as the memory build-up occurs in the child process. However, the overhead of launching the S child process is large, so the parts should be fairly large.
The assign()
function may be used with paste()
to create
function names in a loop. For example, suppose we have a list
my.list
of length n and we want to create variables x1,...,xn
each containing the component comp
of the corresponding element
of my.list
. This may be done using
for (i in 1:n) assign(paste("x",i,sep=""), my.list[i]$comp)
Looking for..."function", ignored one...
?This happens when an object you have created has the same name as a function somewhere later on your search list. S knows that it is looking for a function, so it ignores your object with the same name. To avoid warning messages such as this, rename or remove your object. Such messages often result from creating objects with simple names like "c", "q", or "t" that are also the names of standard S functions.
Of course, if you create a function on the working directory with the same name as a built-in function, then your function will be used instead of the built-in function.
S-PLUS has a function, called masked()
, that lists objects in your
.Data
that have the same name as other objects on the search list.
A similar function, called find()
, lists all directories
in which its argument name exists.
A nice way to run S in Unix is using the S-mode within the Emacs editor. Features include recall of past commands, a session log, and easy editing of functions and scripts.
A current version of the emacs-lisp software for running S-mode is
available from statlib
as /S/gnuemacs4
.
Using a Web brower, the most recent version of the software
may be obtained from Lancaster University.
For questions on S-mode for Emacs, ask for help on the S-mode
mailing list. To get information describing this mailing list send a
message to majordomo@stat.math.ethz.ch
with a body containing the
line info S-mode
. Send mail to
S-mode-request@stat.math.ethz.ch
in order to subscribe or
unsubscribe.
Jump to: a - c - d - f - g - i - m - n - o - r - s - w
round()
S-news
local mailing list
S-news
mailing address
S-news
, mailing list
S-news
, subscribing to
S-news
, unsubscribing from
statlib
This document was generated on 29 October 2002 using texi2html 1.56k.