Frequently Asked Questions about S.

Introduction

Frequently Asked Questions about S. $Revision: 1.24 $

This document contains answers to some of the most frequently asked questions in the S-news mailing list and sci.stat.math newsgroup about S and S-PLUS. They're all good questions, but they come up often enough that substantial net bandwidth can be saved by looking here first before asking.

MathSoft now has their own Web site which provides useful material which is kept more current than the information provided here. The URL for this site is http://www.mathsoft.com/. If you are using a Web browser, you may access this site by clicking here.

This list is currently available in ASCII, TeXInfo, and HTML versions. These are maintained by Charles Roosen <charles@playfair.stanford.edu> with the gracious assistance of Martin Maechler and many subscribers to the S-news mailing list.

The ASCII version is archived as /S/faq on statlib. For information on accessing statlib, See section What is the statlib server? How can I access it?.
The TeXInfo version is archived as /S/faq.texinfo on statlib.
The HTML version (for WWW, i.e. "World Wide Web") is available using a Web browser such as Mosaic. The URL is http://www.stat.math.ethz.ch/S-FAQ.

The primary document is the TeXInfo file. The HTML version is automatically produced from the TeXInfo version using texi2html. The ASCII version is produced from the TeXInfo using makeinfo.

These documents are also available by anonymous ftp from ftp.stat.math.ethz.ch in directory /pub/Doc. Note that unless we develop a script to automatically update the files at statlib, the versions at ftp.stat.math.ethz.ch are more current than those at statlib.

If you have any suggested additions or corrections, please send them to Charles Roosen <charles@playfair.stanford.edu> with a subject line of the form S-faq Suggestion: [Insert Topic Here].

Special thanks to Rick Becker and Brian Ripley for their patience in looking over and making suggestions and corrections to the many attempts that led to this faq. Bill Dunlap and Pat Burns were kind enough to go through and suggest modifications. Thanks are also due to the members of the S-news newsgroup, too many to mention, who contributed suggestions and questions for this FAQ. Portions of this faq were shamelessly lifted from a similar document provided by Rick Becker and previous answers on this net.

Some general questions, though stated in terms of S only, are applicable to both S and S-PLUS.

Absolutely Important

Subscribing to/unsubscribing from the `S-news` mailing list.

Requests to be added or dropped from the S-news list should be sent to the electronic mail address s-news-request@utstat.toronto.edu. Sending a one line message saying either subscribe or unsubscribe will suffice.

Please do not send mail to s-news@utstat.toronto.edu since any mail sent to this address is automatically forwarded to thousands of other users.

Your local site may have many subscribers to S-news. In that case, it may have a local mailing list, so that a single message from utstat.toronto.edu is sent to the site and propagated to the local list. In this case please contact your system administrator to be added to or dropped from the local S-news list.

Due to the existence of these local lists, S-news-request is administrated manually. Hence subscribe and unsubscribe requests are not fulfilled instantaneously. They will be handled in a timely manner, so please be patient.

Asking questions of the mailing list.

Before you ask a question of the mailing list check with your local S expert. If you don't have any luck there, read through this FAQ. If you still don't have an answer to your question go ahead and send mail to s-news@utstat.toronto.edu. When asking questions please specify (if relevant):

Whether you are using S or S-PLUS, and the version. You can find out which version you are running by typing S VERSION or Splus VERSION.
The manufacturer and model of machine you are working on.
The operating system (or the output of uname -a for Unix enthusiasts).
Sometimes the amount of memory (real and virtual) is also relevant.

When answers are sent to you individually and not to the mailing list, it is considered good etiquette to summarize the answers and mail them to the newsgroup.

Guidelines for answering questions on the mailing list.

Try to make your answer broad enough that people other than the original poster may benefit from it. If you consider your answer to have broad interest, you may want to post it to the newsgroup instead of replying directly to the individual who asked. In this case, please make sure that your answer is not a duplication of a previous answer. Because of the manner in which the newsgroup is distributed, messages arrive in different orders at different sites, so don't assume that the message you are answering will arrive before yours. Try to summarize the essential point of the question before your reply, but don't feel obliged to quote the whole question.

General Topics

What is S? What can S be used for?

S is a very high level language and an environment for data analysis and graphics. S was written by Richard A. Becker, John M. Chambers, and Allan R. Wilks of AT&T Bell Laboratories Statistics Research Department. More recently, other Bell Labs researchers have made major contributions to a new modeling capability in S. The S language is the form in which S users express their computations. The environment provides facilities for data management, support for many graphics devices, etc. S is useful for computation in a wide range of applications. It's a very general tool, so that applications are not restricted to any particular subject area. One way to think of it is to imagine the wide range of applications that can be handled by a spreadsheet program--but think of an even broader range of applications because S is much more flexible for complex computations. As examples, S has been used for computing in business, finance, experimental science, etc. The authors of S prefer that you not call S a statistics package. Most of the people who use S have no attachment to statistics, and most of the S applications involve basic quantitative computations and graphics. "Package" really doesn't apply well to S, either. The word "package" often connotes a collection of unrelated tools put together under one name; this is unlike S, where all functions are tightly integrated and controlled by the S language.

What is the current S version?

The current S version is the April, 1992 version. A new release is currently being Beta tested and will be released commercially in 1998. This version (Version 4) will incorporate significant changes to the language, yet will be backwards compatible with the existing version of S.

What is S-PLUS? What extras does it have?

S-PLUS is a value-added version of S sold by MathSoft, Inc. S-PLUS is a fully supported and documented application which has been compiled and tested on numerous architectures. It is available in both UNIX and Windows versions.

S is a subset of S-PLUS, and hence anything which may be done in S may be done in S-PLUS. In addition S-PLUS has extended functionality in a wide variety areas, including robust regression, modern nonparametric regression, time series, survival analysis, multivariate analysis, classical statistical tests, quality control, and graphics drivers. Add-on modules add additional capabilities for wavelet analysis, spatial statistics, and design of experiments. In addition, S-PLUS 4.0 for Windows introduces a full-featured graphical user interface with tremendous functionality.

What is the current S-PLUS version?

The current version of S-PLUS is Version 3.4 on UNIX and 4.0 on Windows. For a complete table of operating systems for which S-PLUS is available see See section What machines does S/S-PLUS run on?.

What is old S?

The book, "The New S Language", was published in 1988 and introduced the modern version of S that is known today. The version of S prior to that (described in "S: An Interactive Environment for Data Analysis and Graphics") is now called old S and is defunct.

How do I get S?

The source code for S is licensed by AT&T Bell Laboratories, but is distributed exclusively by StatSci. For information on contacting StatSci, See section How do I get S-PLUS?.

For each release of S, there is only one version of the source code--it contains instructions for compiling on a variety of platforms. For more information, See section Should I get S in source form or binary?.

Should I get S in source form or binary?

If you really want to read (and/or modify) the C and Fortran code that makes up S, then you need a source license. For most people who want to run S, however, a binary version of S is more convenient. It has already been compiled and specialized to a particular machine and thus it can be installed very easily. It may support specialized graphics devices for that machine. It will also be much less expensive, both in initial cost and because you don't need to purchase C and Fortran compilers in order to process the source code.

S is written primarily in the C language, but it also makes use of Fortran subroutines to carry out various numerical algorithms. Thus, in order to compile the S source code, you need both C and Fortran compilers, (and the compilers must be compatible with one another so that C programs can call Fortran programs, and vice versa.) It may be possible to use a Fortran-to-C translator rather than a compatible Fortran compiler.

At one point in time a binary version of S known as SUCCESS was available for some machines. Currently S is only available in binary from StatSci as a component of S-PLUS.

For more information, See section What software do I need to go along with S?, See section How does one install S? How long does it take?, See section How much disk space is necessary to install S?, See section How much disk space is necessary to install S-PLUS?, See section What is the best machine for S?, and See section What operating system does S need?.

How much does S/S-PLUS cost?

There are various prices for S, depending on whether you get it in source form or in binary form. The binary version price also depends on the particular machine it is targeted for. There is also an educational price schedule and different prices in different countries. To get current prices appropriate to your situation, talk to a sales representative.

For more information, See section How do I get S?.

Prices for S-PLUS, too, cannot be specified because of country dependence and discounts for non-profit and academic users.

For more info, See section How do I get S-PLUS?.

How do I get S-PLUS?

An electronic source for information on S-PLUS is the MathSoft Web site. The URL for this site is http://www.mathsoft.com/. If you are using a Web browser, you may access this site by clicking here.

You can get S-PLUS in North America from:

     MathSoft          
     1700 Westlake Ave North
     Suite 500
     Seattle, WA 98109
     (800) 569-0123 toll free in the United States and Canada
     (206) 283-8802
     (206) 283-8691 fax
     E-Mail: mktg@statsci.com

Sales and support outside of North America is provided through distributors. Contact information for local distributors may be obtained from MathSoft at:

     MathSoft 
     Knightway House
     Park Street
     Bagshot
     GU19 5AQ
     England

     +44 276 452299
     +44 276 451224
     shelp@mathsoft.co.uk

Can I get the source code for S-PLUS?

No, MathSoft does not supply source code for S-PLUS.

Are there other supported versions of S?

At one point in time a binary version of S known as SUCCESS was available for some machines. However, this product is no longer produced. MathSoft is the exclusive U.S. distributor of S and S-PLUS. There are local distributors in a number of other countries. MathSoft in Seattle will forward requests to the relevant distributor.

What software do I need to go along with S?

If you run a binary version of S, such as S-PLUS, then you will be supplied with all the software necessary for using the system. You may find it useful to have C or Fortran compilers in order to use the .C() and .Fortran() functions in S--these allow you to add your own compiled algorithms to S.

For more info, See section Should I get S in source form or binary?, and See section Dynamic Loading in S.

How does one install S? How long does it take?

In general, the person installing S on a machine should be familiar with the machine and its operating system, and should have some moderate level of computer sophistication. It requires much more training to compile and install S than to use it. If you fear this may be more than you are up to, then you should probably think of getting a binary version of S that is ready to install on your machine.

The installation of S is run under control of a program, and when there are no difficulties, it runs quickly and without intervention. On certain machines, S has been installed, including all compilations, in under an hour (about 15 minutes to set everything up and then wait for the compiles). Of course, that is when everything goes right. There is no known upper bound on the time to install.

For more information, See section Should I get S in source form or binary?.

How much disk space is necessary to install S?

You should have approximately 40Mb of free disk space before installing the S source code and attempting to compile it. After the compilation process, you can remove many files, producing an executable version of S that occupies approximately 10--15Mb of disk (the precise number depends on the machine and operating system).

How much disk space is necessary to install S-PLUS?

S-PLUS is distributed in binary form and installation is quick, a few minutes (the exact time depends on the local environment) adapting to the local environment.

MathSoft recommends 45 MB of storage space for UNIX and 80 MB for Windows. (A lesser amount of disk space can be used to install a partial version of S-PLUS.) In addition, 48 MB or more of swap space is recommended for the UNIX version, where the swap space actually required is dependent on the size of the data sets analyzed.

What machines does S/S-PLUS run on?

S runs on a wide range of computers, from powerful personal computers to large mainframes. Most people use S on "professional workstations" such as those manufactured by DEC, Hewlett Packard, Silicon Graphics, Sun Microsystems, and others. High-end personal machines, notably machines based on the Intel 80486 or Pentium architectures are also reasonable candidates for running S. (For more information, See section What operating system does S need?.)

S-PLUS is available on a variety of UNIX and Windows computers. Please refer to the table below to see platform, hardware requirements, OS requirements and expected upgrade dates. All current S-PLUS versions are based on the AT&T S version dated May, '92.

     Platform          OS        S-PLUS release
                       req'd          number
    ---------------------------------------------
     SPARC and SPARC   SUNOS 4.1.3/4.1.4  3.4
     compatibles       Solaris 2.3/2.4/2.5

     DECstation        Ultrix 4.4         3.4
                       

     DEC Alpha         OSF 3.2            3.4

     HP 9000-7xx       HP-UX 9.x          3.4
      and 8xx          

     IBM RS-6000       AIX 3.2.5          3.4
                       

     SGI IRIS-4D,      IRIX 5.2/5.3/6.0   3.4
     Indigo, IRIS      
     compatibles

     S-PLUS for WINDOWS  Windows 3.1,     4.0
                         Windows 3.1.1,
                         Windows 95,
                         Windows NT
 

    ---------------------------------------------


      RAM MEMORY REQUIRED:              UNIX    12 MB
                                        Windows 32 MB

      HARD DISK SPACE REQUIRED:

        Swap Space
            UNIX     48 MB or more swap space recommended.  
	      	     Swap space actually required is dependent 
		     on size of data analyzed.

        Hard Disk Storage Space:
            UNIX     45 MB
            Windows  80 MB

In all cases, more memory will result in improved S-PLUS performance.

If the machine you are interested in running S-PLUS on is not on the list above contact the MathSoft Sales Department at 800-569-0123 or at mktg@statsci.com for more information.

Will S/S-PLUS run on machine X running OS Y?

`S': Questions of this sort are almost invariably too difficult to answer. S is a large system that depends on a number of components supplied by the operating system. While S is generally written in such a way as to be robust to common differences in operating system implementations, sometimes there are problems that crop up that there is no way to anticipate. The only way to really answer a question like this is to do it and find out what troubles come up. Unfortunately, no one has the resources to try all machine/operating system variations, so you'll have to undertake any particular installation of S source code at your own risk.
`S-PLUS': Contact MathSoft for the answer to this question.

What is the best machine for S?

That's particularly difficult to answer and even if we came up with an answer today, it would likely be different tomorrow. Because S operates on a wide range of machines that run the Unix operating system, and because the newest, most powerful and cost-effective workstations normally run Unix, S generally operates on the "best" machines at any point in time.

What operating system does S need?

S was designed to work with the Unix operating system; it works with the System V, Berkeley and Research versions. Several organizations have made S work with other operating systems, too, including DEC's VMS and Microsoft's Windows 95.

For more information, See section What machines does S/S-PLUS run on?, and See section Will S/S-PLUS run on machine X running OS Y?.

Can S do statistical computations?

One of the powerful features of S is its unified capability for expressing statistical models. The 1992 book, `Statistical Models in S', is written specifically for people who want to perform statistical computations. It describes how to use S to carry out a wide range of computations for techniques such as linear models, analysis of variance, generalized linear models, generalized additive models, smoothing, tree-based models, and non-linear models.

Simpler and more classical statistical computations can easily be programmed in S; often you will find someone has already done the work, e.g. the MASS package available from statlib, (See section What is the statlib server? How can I access it?, for information on statlib) and in S-PLUS.

How do I work with S? Is it hard to learn?

You type an expression to S; S evaluates it and displays the answer. Thus, S works something like a desk calculator. The difference is that S can operate with large collections of data at once, so one expression might produce a graph, fit a line to a set of points, or carry out another complex operation.

S is a language that conforms to a particularly small, uniform set of rules. That means that the S language itself is easy to learn. In fact, most non-programmers find S very natural; programmers occasionally have trouble with S concepts because they are so much more general than those in traditional programming languages. It will take some time, though, to become familiar with the large number of functions supplied with S.

S-PLUS 4.0 for Windows adds a customizable point-and-click interface with statistics and graphics menus and dialogs.

For more info, See section What documentation is available for S, S-PLUS?, See section What is the statlib server? How can I access it?, and See section Are archives of the S-news digests available?.

Documentation

What documentation is available for S, S-PLUS?

Primary Books

The primary references for S are two books by the creators of S.

R.A. Becker, J.M. Chambers and A.R. Wilks (1988), "The New S Language," Chapman and Hall, London. This book is often called the "Blue book".
J.M. Chambers and T.J. Hastie (1992), "Statistical Models in S," Chapman and Hall, London. This is also called the "White book".

Two somewhat dated books describing early versions of S are

R. A. Becker and J. M. Chambers (1984), "S: An Interactive Environment for Data Analysis and Graphics," Chapman and Hall, London.
R. A. Becker and J. M. Chambers (1985), "Extending the S System," Chapman and Hall, London.

S-PLUS Manuals

S-PLUS comes with its own extensive set of manuals. Note that due to nontrivial printing costs the Reference Manuals must currently be purchased separately.

S-PLUS for UNIX 3.4 Documentation
- A Crash Course in S-PLUS
- A Gentle Introduction to S-PLUS
- Read Me First
- S-PLUS Global Index
- S-PLUS Guide to Statistical and Mathematical Analysis
- S-PLUS Installation and Maintenance Manual
- S-PLUS Programmer's Manual
- S-PLUS Trellis Graphics User's Manual
- S-PLUS User's Manual
- S-PLUS Version 3.4 Supplement
S-PLUS for Windows Documentation
- S-PLUS Guide to Statistics
- S-PLUS Programmer's Guide
- S-PLUS User's Guide

Online Documentation

S and S-PLUS both contain online documentation for all of their functions via the help() function. In the UNIX version of S-PLUS the help.start() function provides a convenient menu-driven help system, while in the Windows version help is provided through the Windows Help system.

Two guides to S-PLUS are available from the S directory of statlib (See section What is the statlib server? How can I access it?, for information on statlib), both of which contain much material useful to users of S of August 1991 or later versions.

The index entries are:

`sguide'
`sguide.ps1'
`sguide.ps2': "Introductory Guide to S-PLUS". A beginners' guide to doing statistics in S-PLUS. SGuide is a shar archive of LaTeX source, styles, figures and data. SGuide.ps1 and SGuide.ps2 are PostScript full size and reduced 2-on-1 respectively. Archive ripley is also needed. Submitted by Brian Ripley (ripley@stats.ox.ac.uk)
`splusnotes': Instructions for obtaining the LaTeX (and postscript) source, and associated data, for a short course on S-PLUS. The document talks mostly about plain S features and it does not concentrate on features specific to S-PLUS. Very useful as an introductory document. Supersedes snotes. Created by Bill Venables (venables@stats.adelaide.edu.au) and David Smith (D.M.Smith@lancaster.ac.uk). Using a Web browser, these notes also may be obtained from Lancaster University.

These are also available by anonymous ftp from markov.stats.ox.ac.uk [192.76.20.1] in directory pub/S (see the file README for current details) and on statlib. (See section What is the statlib server? How can I access it?, for information on statlib).

Other Books

Other books which discuss particular aspects of S and S-Plus include the following.

A. Bruce and H.-Y. Gao (1996), "Applied Wavelet Analysis with S-PLUS," Springer-Verlag, New York.
W. Cleveland (1993), "Visualizing Data," Hobart Press, Summit, NJ.
B. Everitt (1994), "A Handbook of Statistical Analyses using S-PLUS," Chapman & Hall, London.
W. H\"ardle (1991), "Smoothing Techniques with Implementation in S," Springer-Verlag, New York.
S. Huet, A. Bouvier, M.-A. Gruet, and E. Joliet (1996), "Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples," Springer-Verlag, New York.
A. Krause and M. Olson (1997), "The Basics of S and S-PLUS", Springer, New York.
A. Marazzi (1992), "Algorithms, Routines and S Functions for Robust Statistics," Wadsworth & Brooks/Cole.
P. Spector (1994), "An Introduction to S and S-PLUS," Duxbury Press, Belmont, CA.
W. N. Venables and B. D. Ripley (1997), "Modern Applied Statistics with S-PLUS," 2nd Ed., Springer, New York.

For readers of Japanese there are

M. Sibuya and R. Shibata (1992), "Data Analysis by using S," Kyouritsu Syuppan, Japan.
A 1991 translation of Becker, Chambers & Wilks (1988) by M. Sibuya and R. Shibata.
A 1994 translation of Chambers and Hastie (1992) by R. Shibata.

For readers of German there is

F. Boker (1997), "S-PLUS -- Learning by Doing: Eine Anleitung zum Arbeiten mit S-PLUS," Verlag Lucius & Lucius, ISBN 3-8282-0049-4.
A. Krause (1997), "Einfuehrung in S und S-PLUS," Spring-Verlag, New York. ISBN 3-540-60932-6
B. Suselbeck (1993), "S und S-PLUS: Eine Einfuhrung in Programmierung und Anwendung," Gustav Fischer Verlag, Jenna, New York. ISBN 3-437-40232-3.

What is the `statlib` server? How can I access it?

statlib is a system for distributing statistical software by electronic mail, ftp, and World Wide Web.

The easiest way to access statlib is using a Web browser (e.g. Mosaic) with a URL of http://lib.stat.cmu.edu/. If you are using a Web browser, you may access statlib by clicking here.

To access the statlib mail server, send a mail message to statlib@lib.stat.cmu.edu. For starters, send a message containing the following:

              send index
              send index from S

This will give you an index of the general and S-specific material available on the statlib server.

Remember that the server does not understand English or any other language. Your requests must be exactly in the form specified.

Anonymous ftp access is also available. Type ftp lib.stat.cmu.edu At the login prompt, type statlib (without the quotes) and give your e-mail address as the password.

A `mirror' of the statlib archive in the UK is available at unix.hensa.ac.uk. For details on the mail server, send email to netlib@unix.hensa.ac.uk with a body of send browser.

The site can also be accessed by telnet (log in as 'archive'), by anonymous ftp, or by WWW with the URL http://www.hensa.ac.uk/. The statlib archive is under /statlib. Using a Web browser, you may access this statlib mirror by clicking here.

Are archives of the `S-news` digests available?

Archives of the S-news digests are available at statlib, in the directory s-news, and these can be requested by e-mail or retrieved by ftp. There were 175 digests as of November, 1994.

You can search the digests by keyword. The format of the find command in an e-mail is:

find <digest_number> <keyword>[ <keyword>..] in s-news

For example, to search digest1 for the keywords `regression' and `transformation', mail the following to statlib:

find digest1 regression transformation in s-news

Note that the word "all" in place of <digest_number> will search all digests. The introductory message from statlib gives more details.

See section What is the statlib server? How can I access it?, for information on statlib.

Input and Output

What kinds of data can S read?

S comes with functions designed to read ASCII files. It also has the ability to invoke commands in the operating system and to interface with C and Fortran programs. These can be used to access data kept in other forms, in database management systems, etc.

How do I read data into S/S-PLUS from an ASCII file?

The scan() function can be used to read data from a text file or interactively from standard input. The function make.fields() can be used to create fields with a specified field separator so that the file can be used as input to scan().

The read.table() function reads an ASCII file and creates a data frame (Refer to the White book (See section What documentation is available for S, S-PLUS?, for more info) for information about data frames).

S-PLUS 4.0 for Windows allows data import and export from a variety of file formats such as Excel, SAS, and SPSS. See the File:Import Data:From File menu item or the import.data() and export.data() functions.

How can I write S data to an ASCII file?

The write() function allows you to write S data into a file in ASCII format.

The functions print(), format(), cat() and paste() can be used to format output to be written on to the files.

The sink() function allows you to enter output from S/S-PLUS commands into a file.

The data.dump(), dump() and dput() functions write S objects into ASCII files but not in regular text format. They are used for data transfer between machines.

For more info, See section Can S/S-PLUS objects be transferred from one machine to another?, and See section When is dump() and restore()/source() preferable to data.dump()/data.restore()?.

Can S/S-PLUS objects be transferred from one machine to another?

S objects are stored as binary files for efficiency when they are accessed. Because these files contain hardware-dependent information (floating point representations, for example), they should not be moved directly from one machine to another unless you are sure that the underlying machine arithmetic and storage policies are identical.

The portable way to move S objects is to convert them to an ASCII file using the data.dump() function; the file can be moved to the new machine and the objects recreated using data.restore().

For more information,See section When is dump() and restore()/source() preferable to data.dump()/data.restore()?.

When is `dump()` and `restore()`/`source()` preferable to `data.dump()`/`data.restore()`?

The only advantage of dump()/restore() over data.dump()/data.restore() is that the ASCII file produced is easy to read and change. Thus dump is often used to produce ASCII files of S functions which are then edited and redefined using source or restore. However, for any S data objects that are not small, data.dump is recommended since it is faster and uses much less memory.

When dump() and restore() were created they were intended to accommodate the entire range of S data structures using the same syntax as the S language did. restore() parses and then evaluates each dump'ed object. If you have a matrix containing 10000 numbers, restoration of the file executes the c() function with 10000 arguments. That takes quite a bit of space to parse and evaluate.

data.dump() and data.restore() are designed to be used in the same way as dump() and restore() but deal with an ASCII representation that can be efficiently turned back to S objects.

Why does `round()` sometimes not print rounded values?

There are two stages in rounding--the first step is producing an internal representation of the rounded value. For example,

> x <- .123450000001
> y <- round(x,3)

uses the machine's floating point arithmetic to produces the best approximation to the the numeric value .123 in y. That's all the round() function does.

The next step is printing this value or incorporating it into a text string. It is at this stage when things can go astray. The S print function, invoked automatically when S objects are printed, tries hard to produce a pretty visual representation of the value being printed.

> x
[1] 0.12345
> y
[1] 0.123

(options(digits=) controls how many digits the print function thinks are important).

Other functions that convert numeric to character, may not produce results as "pretty" as print does:

> as.character(x)
[1] "0.123450000001"
> as.character(y)  
[1] "0.123"

Depending on the machine's arithmetic, there may even be instances where as.character() (or cat() or paste()) will produce extra digits from a rounded value.

The solution is to use the function format(), to turn the numeric value into a "pretty" character value:

> format(x)
[1] "0.12345"

This is particularly important in building character strings

> paste("r = ",format(round(x, dig=4)))
[1] "r =  0.1234"

Functions in S

Are there any guidelines to writing functions in S?

Before you write a major function check on statlib to see if there is something similar to what you need. (See section Where can I get contributed functions?, for information on obtaining contributed functions and See section What is the statlib server? How can I access it?, for info on statlib). A question to the S-news list with a brief description of what you want (See section Asking questions of the mailing list.,) might elicit useful responses.

Some general guidelines for function writing are given below:

Use full names for arguments and function names; args can be abbreviated, so the full name doesn't hurt. There are lots of functions, so a good name is important.

Provide reasonable defaults for arguments.

Read current S code to see (some) examples of good style.

Start simply, get something working immediately and build capabilities gradually and interactively. Try to think of your computation in "whole data" terms. What is it trying to produce as a final result? Don't rush in to write it as a sequential Fortran algorithm.

Use self-checking computations while doing interactive data analysis with S. Try to think of ways of checking your work. For example,

sum(resid) == 0.

The function, browser(), could be of help in debugging your functions.

Try to deal with the most general situation if not too ugly. For example, NAs, character data, lists, 0-length args, rather than just numeric vectors. On the other hand, it's easy to get too ugly. It's better to have short simple computation that does 90% of all cases than to try to accommodate all things.

Do appropriate error checks on args if the standard error message is cryptic.

Try to avoid explicit loops if there are suitable primitives available that can do the job. (Note that some primitives also use loops, e.g. apply(). They are, however, likely to be written with more care than you might be willing to give.)

Be especially careful of building up a vector element by element in loops. When necessary, element by element computations should be done by creating an object and then replacing pieces of it rather than having an object grow by gluing together pieces.

Use comments where appropriate but save blocks of text for the online documentation (You are writing online documentation, right?)

Graphics functions should change as little of the graphics state as possible. This allows the user (or function) that calls the graphics function to achieve its own specialization.

Use on.exit() to clean up -- graphical parameter changes, removing temp files, etc.

Where can I get contributed functions?

Contributed functions are available on statlib. Check the index on statlib to find out which functions are available. (See section What is the statlib server? How can I access it?, for info on statlib.

Dynamic Loading in S

What is dynamic loading? When is it available?

Dynamic loading is implemented by the S function dyn.load. (See "The New S Language", Ch. 7, pgs 193-204). This function will take the object file (typically output

by a C or Fortran compiler) and load it into memory so that it can be executed by the S functions .C() or .Fortran(). During the loading, dyn.load() attempts to resolve any references to other routines. These references can come from explicit subroutine calls or from implicit calls to library routines.

Unfortunately, the implementation of dyn.load() is difficult and dependent on hardware and the operating system, so the AT&T distribution of S provides it for Vax and Motorola 68000-based architectures. AT&T does not supply dyn.load for Sun's Sparc architecture.

MathSoft provides a working version of dyn.load, dyn.load2, and/or dyn.load.shared for each Unix architecture on which S-PLUS runs.

For more info, See section What is the statlib server? How can I access it?.

What is static loading and how does it differ from dynamic loading?

Static loading is another way of loading subroutines with S. Static loading creates a local version of S in your current working directory. Remember, however, that a copy of S requires about 6Mb of space and must be recreated whenever changes are made to S.

You would use static loading if:

You have subroutines that will almost always be used.
Your version of S does not support dyn.load.
You have extensive sets of subroutines and dyn.load()/dyn.load2() would be too slow. A particular example is when the loaded code depends on libraries and dyn.load2() complains about missing or duplicate symbols, since dyn.load2 is slow it may be faster in this case to use static loading instead of repeated calls to dyn.load2.

Can I call S/S-PLUS routines from within C?

Yes, you can use the function call_S within C to call an S/S-PLUS function from within a C program. See section 7.2.4 of the blue book, and SPLUS reference manual. Note that C code calling S must be linked into the S executable (via dynamic or static loading).

Graphics/Fonts

What graphics devices does S/S-PLUS support?

`S': S provides a device-independent model for graphics and supports batch output devices such as laser printers and phototypesetters that use the PostScript language, the pic preprocessor language for troff, ordinary character-based printers and terminals, Hewlett-Packard pen plotters that use the HP-GL language, graphics terminals using Tektronix and Hewlett-Packard conventions, and terminals/workstations running the X window system. Enhancers have added devices and features.
`S-PLUS': S-PLUS includes the graphics drivers that come with S, with the x11() device replaced by X11(). In addition, S-PLUS has two graphics drivers tailored to particular window managers: openlook() and motif(). The Windows version of S-PLUS supports the standard Windows graphics driver using win.graph(), and includes a printer driver win.printer(). S-PLUS 4.0 for Windows introduces a new Windows graphics driver graphsheet() which generates point-and-click editable graphics.

How do I find what graphics devices my S/S-PLUS supports?

Execute the expression

  help(Devices)

Does S/S-PLUS have a menu-based interface?

S-PLUS 4.0 for Windows provides a wealth of menu and dialog based functionality, including completely extensible and customizable menus and dialogs.

S-PLUS 3.4 for UNIX does not have built-in statistics menus, but does include tools for building menus and dialogs.

Does S have dynamic graphics? Does S-PLUS?

Most software to support dynamic graphics is tuned to a particular output device. Since S provides a device-independent graphical system, there are no dynamic graphics applications that are part of S. However, S has been used effectively as a platform from which device-dependent graphics code can be executed. In this case, S provides for data management, computations, etc., and hardware-specific routines are called to produce the dynamic displays. For users on Silicon Graphics machines, S provides library(brush) which implements brushing and point cloud rotation using SGI's gl library.

S-PLUS does have dynamic graphics using the X and sunview window systems; see its brush() and spin() functions.

How can I generate figures in S/S-PLUS for inclusion elsewhere?

S-PLUS 4.0 for Windows makes it easy to export graphics to a wide variety of formats through the File:Export Graph menu item.

For S-PLUS 3.4 for UNIX, a summary of comments by Bill Venables, Dave Smith and Brian Ripley follows.

The alternatives are either to produce PostScript directly from S/S-PLUS, or to go via a graphical representation such as that of fig (a public domain drawing package).

S has the postscript() driver, as in
```
postscript(file="1.eps", height=4, width=5, horiz=T, pointsize=8)
```
If you use postscript() directly, remember to call graphics.off() (or quit S) after finishing the plot calls. S-PLUS users can call postscript() via dev.print().
S-PLUS up to version 3.0 has the pscript() driver, which can be used either directly (with "onefile=F" and calling graphics.off() after use) or via
```
dev.print(pscript, onefile=F, print=F, ...)
```
With a windowing system and S-PLUS one can replace the `PostScript Print Command' in the graphics window by rmv filename and click on print. [Here rmv is a shell script with contents mv $2 $1.] If this is available, this is the easiest way.
To produce plots in the fig format use the fig() driver obtainable from statlib by send fig from S. (See section What is the statlib server? How can I access it?, for information on statlib).

To include PostScript in TeX/LaTeX documents you need to consult the details of your dvi to ps program. Two macro packages, epsf and psfig, make the job much easier. Both are distributed with Tomas Rokicki's dvips, obtainable from labrea.stanford.edu in ~ftp/pub. In 3/93 the latest version was 5.514. Other versions of epsf and psfig are available for other dvi to ps programs, from a wide variety of archives. A wide range of PostScript editors are available, and cognescenti can edit PostScript directly.

Fig-format plots can be edited with xfig and converted to Encapsulated PostScript (and a number of other formats) with fig2dev. (Both are now version 2.1). They are part of the X11R5 distribution, but can be obtained separately by anonymous ftp from export.lcs.mit.edu in the directory /pub/R5untarred/contrib/clients.

How can I plot complicated text using mixed fonts, etc?

Alan M. Zaslavsky has placed an archive of contributed collections on statlib (See section What is the statlib server? How can I access it?, for information on statlib) named postscriptfonts. A short description of the files is given below.

Functions to display postscript fonts and, using the postscript() driver, to add text to a plot (or the margin of a plot) that contains mixed fonts (including Greek), mixed character sizes, local and motions (e.g., sub and superscripts).

`fontdemo'
`ps.show.fonts': Generate displays of the fonts available.
`mixed.text'
`mixed.mtext': Functions for plotting of text containing different fonts, sizes, and local motions using within-text escape sequences to define these changes.
`mixed.text.vector'
`mixed.mtext.vector': Functions for plotting of text containing different fonts, sizes, and local motions using auxiliary vectors to define these changes.
`ps.preamble.ISO.LATIN': A postscript preamble with extended characters (may be used by calling postscript(preamble=ps.preamble.ISO.LATIN)).

Memory Management and Looping

What can I do when S runs slowly or out of memory while looping?

Brian Ripley created the following tutorial, which was posted to s-news on 7 Aug 92 and is archived in digest81 on statlib.

Tutorial Description -- memory problems in S

This is a series of hints on a memory usage in S, from a real teaching example. We are running S-Plus 3.0 (August 1991 S) on Sun Sparcs; the examples were computed on an IPC with 12Mb ram, 32Mb swap on a local disc.

Brian Ripley
ripley@stats.ox.ac.uk

Consider a shoe experiment with 10 boys, an experiment reported in Box, Hunter & Hunter (1977), Statistics for Experimenters. There were two materials (A and B) that were randomly assigned to the left or right shoe:

shoes <- scan(,list(L=0, R=0))
13.2 14.0
8.2 8.8
11.2 10.9
14.3 14.2
11.8 10.7
6.6 6.4
9.5 9.8
10.8 11.3
9.3 8.8
13.3 13.6

attach(shoes)
t.test(L,R, paired=T)

The sample size is rather small, and one might wonder about the validity of the t-distribution. An alternative for a randomized experiment such as this is to base inference on the permutation distribution of d. Computation shows that the agreement is very good, but that computation causes problems in S.

Permutation Distributions

The most obvious way to explore the permutation distribution of the t-test of d = L-R is to select random permutations. The supplied function S-Plus function t.test computes much more than we need, and so is rather slow (about 0.7 secs on a Sun SparcStation SLC). It is simple to write a replacement function to do exactly what we need.

attach(shoes)
d <- L-R
ttest <- function(x) mean(x)/sqrt(var(x)/length(x))
n <- 100
res <- numeric(n)
for(i in 1:n){
        x <- d*sign(runif(10)-0.5)
        res[i] <- ttest(x)
        print(c(i, memory.size()))
}

This took about 70 secs and used an additional 1Mb of memory! Increasing the sample size to 1000 causes seriously antisocial paging activity, and takes about 3 hours.

As the permutation distribution has only 2^10 = 1024 points we can explore it directly:

n <- 1024
perm.res <- numeric(n)
for(i in 1:n){
        j <- i; x<-d
        for(k in 1:10) {x[k] <- x[k]*(2*(j%%2)-1); j <- j%/%2}
        perm.res[i] <- ttest(x)
        print(c(i, memory.size()))
}
par(mfrow=c(1,2))
hist(perm.res, 25, probability=T, xlab="diff")
x <- seq(-4,4, 0.1)
lines(x, dt(x,9))
sres<- c(sort(perm.res), 4)
yres<- (0:1024)/1024
plot(sres, yres, type="S", xlab="diff", ylab="")
lines(x, pt(x,9), lty=3)
legend(-5, 1.05, c("Permutation dsn","t_9 cdf"), lty=c(1,3))

which took about 5 hours and 17Mb of memory.

The problem is that S does not release any memory until the loop is completed, so the memory usage is linear in the size of the loop. It may help to encapsulate the contents of the loop in a function:

n <- 100
res <- numeric(n)
test.t <- function(x){
      res <- ttest(d*sign(runif(10)-0.5))
      print(c(i, memory.size()))
      res
}
for(i in 1:n) res[i] <- test.t(x)

but the resources used in this instance are virtually unchanged. We can of course run 10 loops of length 100, provided these are done sequentially and not in a loop:

attach(shoes)
d <- L-R
ttest <- function(x) mean(x)/sqrt(var(x)/length(x))
test.t <- function(x){
      res <- ttest(d*sign(runif(10)-0.5))
      print(c(i, memory.size()))
      res
      }
	
res <- numeric(1000)
for(i in 1:100) res[i] <- test.t(x)
for(i in 101:200) res[i] <- test.t(x)
for(i in 201:300) res[i] <- test.t(x)
for(i in 301:400) res[i] <- test.t(x)
for(i in 401:500) res[i] <- test.t(x)
for(i in 501:600) res[i] <- test.t(x)
for(i in 601:700) res[i] <- test.t(x)
for(i in 701:800) res[i] <- test.t(x)
for(i in 801:900) res[i] <- test.t(x)
for(i in 901:1000) res[i] <- test.t(x)

but there is yet another catch. If this is run from a file with source() the memory is not released after each loop. It is necessary to put these instructions in a file t.test.s and use

Splus < t.test.s

or to cut-and-paste the instructions into a window running S. With an input file, it can be helpful to include

options(echo=T)

to echo the input. This approach can be taken as far as needed, even to listing each step of the loop on the file to minimize memory usage.

For extensive runs you may want to use the BATCH mode of S, run from the Unix command line as

Splus BATCH infile outfile

which runs a background job (at reduced priority 5) taking input from infile and writing output to outfile. (Note that options(echo=T) is set automatically in this mode.)

The corresponding command under Windows is

Splus /BATCH infile outfile

Bootstrapping

Generating a bootstrap sample in S is very easy:

sample(x, replace=T)

samples with replacement length(x) items from x. The difficulty comes from the memory problems. We can avoid explicit loops by the following device:

n <- 100
A <- matrix(rep(d,n),10,n)
res <- apply(A,2,function(x) ttest(sample(x, replace=T)))
print(memory.size())

but this is once again subject to memory build-up to store A, which needs 8 x 10 x n bytes of storage, and for the internal for loop in apply . However, we can do the computation in chunks, from the command-line or an input file:

n <- 250
A <- matrix(rep(d,n),10,n)
res <- apply(A,2,function(x) ttest(sample(x, replace=T)))
res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) )
res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) )
res<- c(res, apply(A,2,function(x) ttest(sample(x, replace=T))) )

which used about 3Mb and took 7 minutes. The alternative approach of a series of four explicit for() loops uses essentially the same memory but took 40 seconds longer.

However they are done, S is not very suitable for long series of simulations of small problems. These are better done by an external computer program or a special subroutine calling a Fortran or C program.

If S must be used, the ultimate approach is to split the problem up into small pieces and to launch a child S process for each part. It helps that assignments are permanent. The basic style is

unix("Splus < infile >> outfile", output.to.S=F)

and as an example, create a file named t.test.s containing the commands

for (i in 1:100) res1[i] <- ttest(sample(d, replace=T))
res <- c(res, res1)

then within Splus

attach(shoes)
d <- L-R
res1 <- numeric(100)
res <- numeric(0)
ttest <- function(x) mean(x)/sqrt(var(x)/length(x))
for(i in 1:10) unix("Splus < t.test.s >> junk", output.to.S=F)

This took about 4Mb (as 2 S processes run) and 9 minutes. Note that loops can safely be used here, as the memory build-up occurs in the child process. However, the overhead of launching the S child process is large, so the parts should be fairly large.

Data Manipulation

How do I create objects with similar names in a loop?

The assign() function may be used with paste() to create function names in a loop. For example, suppose we have a list my.list of length n and we want to create variables x1,...,xn each containing the component comp of the corresponding element of my.list. This may be done using

     for (i in 1:n) assign(paste("x",i,sep=""), my.list[i]$comp)

Miscellany

What causes the message: `Looking for..."function", ignored one...`?

This happens when an object you have created has the same name as a function somewhere later on your search list. S knows that it is looking for a function, so it ignores your object with the same name. To avoid warning messages such as this, rename or remove your object. Such messages often result from creating objects with simple names like "c", "q", or "t" that are also the names of standard S functions.

Of course, if you create a function on the working directory with the same name as a built-in function, then your function will be used instead of the built-in function.

S-PLUS has a function, called masked(), that lists objects in your .Data that have the same name as other objects on the search list. A similar function, called find(), lists all directories in which its argument name exists.

How do I get information on running S within Emacs?

A nice way to run S in Unix is using the S-mode within the Emacs editor. Features include recall of past commands, a session log, and easy editing of functions and scripts.

A current version of the emacs-lisp software for running S-mode is available from statlib as /S/gnuemacs4. Using a Web brower, the most recent version of the software may be obtained from Lancaster University.

For questions on S-mode for Emacs, ask for help on the S-mode mailing list. To get information describing this mailing list send a message to majordomo@stat.math.ethz.ch with a body containing the line info S-mode. Send mail to S-mode-request@stat.math.ethz.ch in order to subscribe or unsubscribe.