[Rd] Wish List
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Jan 1 20:59:57 CET 2008
Most of the items on this list have been mentioned before but it
may be useful to see them altogether and at any rate every year
I have posted my R wishlist at the beginning of the year.
High priority items pertain to the foundations of R (promises,
environments) since those form the basis of everything
else and the foundation needs to be looked after first.
The medium items are focused on scripting since with a few additional
features R could work more smoothly with other software.
For the Low priority items we listed the rest. They are not necessarily
low in terms of desirability but I wanted to focus the high and
medium items on foundations and scripting.
There is also a section at the end focusing on addon packages.
These may be strictly speaking part of R but are widely used.
High
1. Some way of inspecting promises. It is possible to get
the expression associated with a promise using substitute but
not its environment. Also need a way to copy a promise without
forcing it. See:
https://stat.ethz.ch/pipermail/r-devel/2007-September/046966.html
2. Fix bug when promises are stored in lists:
f <- function(x) environment()
as.list(f(0))$x == 0 # gives error. Should be TRUE.
3. If a package uses "LazyLoad: true" then R changes the class of
certain top level objects. This does not occur if "Lazyload: false"
is used. For an example see:
https://stat.ethz.ch/pipermail/r-devel/2007-October/047118.html
4. If two environment variables point to the same environment they
cannot have different attributes. This effectively thwarts subclassing
of environments (contrary to OO principles).
Medium
5. Sweave. A common scanario is spawning a Sweave job from
another program (such as from a program controlling a
web site). The caller needs to pass some information to the
Sweave program such as the file name of a report to produce.
Its possible to spawn R and have R spawn sweave but given the
existence of R CMD Sweave it would be nice to be able to just
spawn R CMD Sweave directly. Features that would help here
would be:
- support --args or some other method of passing arguments
from R CMD Sweave line to the Sweave script
- have a facility whereby R CMD Sweave can directly generate
the .pdf file and an argument which allows the caller to
define the name of the resulting pdf file, e.g. -o. (With
automated reports one may need to have many different outputs
from the same Rnw file so its important to name them differently.)
- an -x argument similar to Perl/Python/Ruby such that if one calls
R CMD Sweave -x abc myfile.Rnw then all lines up to the first one
matching the indicated regexp, abc here, are skipped. This
facilitates combining the script with a shell or batch file if the
previous is not enough.
Thus one could spawn this from their program:
R CMD Sweave --pdf myfile.Rnw -o myfile-123.pdf --args 23
and it would generate a pdf file from myfile.Rnw of the
indicated name passing 23 as arg1 to the R code embedded in the
Sweave file.
See:
https://stat.ethz.ch/pipermail/r-devel/2007-October/047195.html
https://stat.ethz.ch/pipermail/r-help/2007-December/148091.html
6. -x flag on Rscript as in perl/python/ruby. Useful for combining batch
and R file into a single file on non-UNIX systems. It would cause all
lines until a line starting with #!Rscript to be skipped by the R
processor. See:
https://www.stat.math.ethz.ch/pipermail/r-devel/2007-January/044433.html
Also see
http://www.datafocus.com/docs/perl/pod/perlwin32.asp#running_perl_scripts
since the same considerations as for Perl scripts applies.
There is also some discussion here:
https://stat.ethz.ch/pipermail/r-help/2007-November/145279.html
https://stat.ethz.ch/pipermail/r-help/2007-November/145301.html
Low
7. Define Lag <- function(x, k = 1, ...) lag(x, -k, ,..)
so the user has his choice of which orientation he prefers.
Many packages could make use of it if it were in the core of R including
zoo, dyn, dynlm, fame and others. This would also address comments
such as in ISSUE 4 on this page which is associated with a popular
book on time series:
http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm
8. On Windows, package build tools should check that Cygwin is in
correct position on PATH and issue meaningful error if not. If
you get this wrong currently its quite hard to diagnose unless
you know about it.
9. Implement the R shell() and shell.exec() commands on
non-Windows systems.
10. print.function should be improved to make it obvious how to find what
the user is undoubtedly looking for in both the S3 and S4 cases.
That would address one of the criticisms here:
http://www.stat.columbia.edu/~cook/movabletype/archives/2007/08/they_started_me.html
(The other criticisms at this link are worth addressing too -- ggplot2
and several existing or upcoming books on grid, lattice and ggplot
R graphics are
presumably addressing the criticism that creating graphics is
difficult in R.)
11. Add { to the derivative table so one can write this:
f <- function(x) x*x
deriv(body(f), "x", func = TRUE)
Currently, one must do:
deriv(body(f)[[2]], "x", func = TRUE)
12. as.Date.numeric should have Epoch as default origin. Its currently
asymmetric since as.numeric(as.Date(x)) does not require specifying
the Epoch yet you do have to specify it in the reverse direction.
dd <- Sys.Date()
x <- as.numeric(dd) # ok
as.Date(x) # error
Even worse, sapply (and also ifelse and likely some other functions)
unclass your dates and you should not have to know the origin to get
them back with as.Date .
13. In traceback(), environents in the calling sequence are listed
simply as <environment> so we don't know which environment is being
referenced. It would be more helpful if the hash code associated
with the environment were listed. Also it would be useful if it were
possible to inspect an environment given its hash code as this might
help the programmer determine which environment is being referenced.
14. These provide results which are not valid R:
> dput(alist(a=1,b=))
structure(list(a = 1, b = ), .Names = c("a", "b"))
> dput(alist(a=1,b=),control="all")
structure(list(a = 1, b = quote()), .Names = c("a", "b"))
15. There should be a "LazyLoad: auto" or similar that corresponds to
omitting LazyLoad in the DESCRIPTION file. Currently there no
explicit argument setting that corresponds to the default LazyLoad
setting.
16. facility to directly get the hex code for an environment. Currently
one must do:
capture.output(new.env())
17. stats:::nlsModel. Would like option to NOT calculate derivatives so
it can be used with derivative-free algorithms.
18. Make as.POSIXlt, difftime, filter, rowMeans and rowSums into S3 generics.
Various time series and datetime packages such as fame, zoo and chron
could use these.
Packages
========
19. DBI.
- It should be possible for a program to discover which DBI drivers
are loaded.
- A DBI driver for ODBC would be nice.
20. RSQlite
- automatically set eol correctly in sqliteImportFile according to
input file (currently it defaults to "\n" which only works correctly
for files created on UNIX)
- either give sqliteImportFile its own .Rd page or make it possible
to read ?dbWriteTable without reference to sqliteImportFile
- ability to use arbitrary R functions from within sqlite select statements
(or second best would be to at least support common functions used in
statistics that are not in sqlite such as sd, var, etc.)
- sqliteImportFile should support quoted fields so that .csv files,
the most common input file format, can be supported
See: https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000382.html
https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000384.html
21. MySQL
- dbWriteTable on Windows inserts extra "\r" characters
See: https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000385.html
22. grid
- grid.ls() enhancement to show more info (if grid.ls is analogous to ls
on UNIX then this would be analogous to ls -l) and a grep-like facility
which only lists grid objects with specified attributes, e.g. just list
all grid objects that are dark green
- ability to reset grid names so generating the same plot twice gives
the same sequence of grid names.
My previous wish lists are here:
https://www.stat.math.ethz.ch/pipermail/r-devel/2007-January/044122.html
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-January/035949.html
https://www.stat.math.ethz.ch/pipermail/r-help/2005-January/061984.html
https://www.stat.math.ethz.ch/pipermail/r-devel/2004-January/028465.html
More information about the R-devel
mailing list