[Rd] Another wishlist for R

Kevin Wright kwright at eskimo.com
Fri Jan 16 18:46:40 MET 2004

First, a big thanks to all of the developers and users that have worked to
make R such useful software.  It is only because I find the software so useful
that I have the following opinions.

A recent post to R-devel listed the 'Top 10 Features' for one person.  I found
it to be quite an interesting read.  Over the past couple of years I have
assembled my own lists.

Retrospective.  Some of my favorite things I like about R (vs. S-Plus)

1. Integration with emacs
2. Nice color handling
3. Wealth of packages, easy package updates
4. HTML help
5. More answers on R-news than S-help
6. Active developer community
7. Package creation tools
8. Functions: setwd, with, apropos

Prospective.  Periodically David Smith at Insightful asks users, "If you had
$100, how would you allocate that money to development?"  Without listing
dollar amounts, these are my personal choices for R.

1. Add "head" and "tail" to R base.  
   Patrick Burns has these: http://www.burns-stat.com/pages/public.html#genutil
   Very handy functions for checking data manipulation.

2. Strive for self-contained examples in all .Rd files (as far as possible).
   Generally quite good, but there's always room for improvement.
   For R base, If I create examples, to whom should I send them (R-devel?) and
   how (request for change?).
   Here's one example (by P. Dalgaard) for function 'replace'
     # Replace in a data frame NA´s with -1? 
     dd <- data.frame(a=c(1,2,NA,4),b=c(NA,2,3,4)) 
     dd[] <- lapply(dd,function(x) replace(x, is.na(x), -1)) 

3. Encourage (more) standards for function names.  
   A prominent link on CRAN to the coding conventions would be good.
   Here is a draft of coding conventions:
   Partly as a result of the community development of R, the names of
   functions lack consistency.  Consider the following examples: 
     row.names, rownames
     browseURL, contrib.url, fixup.package.URLs
     package.contents, packageStatus
     mahalanobis, TukeyHSD
     getMethod, getS3method
   The sooner that conventions are encouraged, the more consistent future
   function names will be.

4. Increased integration of text and graphics output (for PDF, in particular).
   Sweave is fantastic for quality reporting, but can be a lot of work
   when a quick analysis is all that is needed.
   Often I would like to do something like print a box plot and include an
   anova table, for example: 
   I know of no such (simple) tools.  Ben Bolker has a an idea here: 

5. Drop unused factor levels by default.  (At least as a settable option.) 
   This issue has been debated before--I'm just adding my vote and justification.
   The proportion of time I want data to include unused factor levels is close
   to zero.  The amount of time I spend cleaning data to get rid of unused
   factor levels is quite substantial.

6. Expanded font control for graphics devices.
   This is already being considered, so again I'm just adding my vote.  See:
7. Clean up namespace implementation
   The introduction of namespaces has (for me) been a nuisance without
   any benefits that I am aware of.  I speak as a user, not a package maintainer. 
   I would like to see (1) more education about namespace benefits, (2) more
   discussions about what is the appropriate role for namespaces and (3) 
   improvements to the documentation, which is now often less correct (if not
   broken) due to namespaces.  For example, help(is.function) doesn't say how
   functions hidden behind namespaces will be treated.  Most help files
   completely ignore issues with namespaces.  
   Some people will say, "of course namespaces are working exactly as
   expected"!  But that is only true if you expect functions to be 
   hidden...quite a few versions of R trained users otherwise.
   The quiet introduction of namespaces has broken my modus operandi for:
   Namespaces may be neat/right from a language-design perspective, but have
   made it more frustrating for me to actually use the software. 

8. More consistency in the use of na.action and na.rm.
   Compare: mean(..., na.rm= ...) lme(..., na.action=... )
   Maybe na.action could be added to 'mean' and other functions.
   There are issues of compatability with S-Plus here...

9. Add 'substitute' to getAnywhere 
   Acutally, the code for getAnywhere already contains 'substitute', so it
   looks like the author intended for the function to work without a quoted
   argument. That would be wonderful.  Then why does
   getAnywhere("predict.lme") work but getAnywhere(predict.lme) does not work? 
   (Yet another namespace issue)

   I'm not the first person to ask this question.  Obviously I'm
   a member of the "blind" population that can't read help files:
   Another possibility is that the help file could be clearer for us blind
   folk that interpret "x: a character string or name" to mean that
   x might not be a character string. (See the help page) 

10. More uniformity in quoting arguments.
    Uniformity outweighs cleverness/exceptions ("The Art of Unix Programming").
    Functions accepting non-quoted arguments
      find(replace) or find("replace")
    Functions requiring quoted arguments

    Some people have claimed "the designers of S knew what they were
    doing"  because you can do clever things like this:
    But we could just as easily be doing other clever things 
    and have more uniform quoting rules.  S is probably too mature for this to
    really be considered.

11. Have 'aggregate' add logical/default names to its value.
    I'm basically echoing this thread:
    Using aggregate(x,by,FUN), I would find it very useful if the factor names in
    the "by" list carried through to the final aggregate data.frame.  Also,
    when 'x' is a vector (and maybe for other data structures), it would be
    nice to have the original names included in the result.

12. Wanted: General-purpose mixed-models function/package
    The nlme library is very nice for mixed-effects models with nested
    effects, but it is not very general-purpose.  Even Bates/Pinheiro have said
    several times in posts to R-help/S-news that nlme was designed for nested
    models and using other models can be hard.
      Bates: "highly unintuitive" (crossed effects model)
      Bates: "algorithms for lme are tuned for nested random effects"
    For example, in nlme,
      The syntax for crossed random effects is quite intimidating
      Try removing the variance component for Rep in: random=~1|Rep/WholePlot.
      Try changing an nested effect from random to fixed (or vice-versa).
      Try to extract lsmeans for fixed-effects in a model.
      Try to do a multiple-comparison of fixed-effects estimates.
      Try using AR1xAR1 error structure.  The nlme library appears to have 
        tools for this, but again is syntactically difficult.  I can find no
    Most of these tasks would ideally be straightforward in a general-purpose
    mixed-models function (as they are in SAS, Genstat, etc.)

    The ASREML software is available in S-Plus (and soon R, I'm told) via
    the proprietary 'samm' library.  Whereas lme seems excellent for basic
    nested-effects models and difficult for other models, samm excels at
    crossed-effects models, but doesn't have the plethora of useful 
    print, plot, extractor, and summary methods that are found in nlme.

13. The fantasy list.  Go ahead and tell me, "In your dreams!"
Deprecate 'update'.  Cute, but makes session transcripts hard to read.
Remove implicit intercepts in models.  Require y~1+x.  Force thinking about intercepts.
Lattice colors could be more saturated for printing and projecting
Rename 'prompt' to something closer to its purpose like makeSkeletonHelp

The humble opinion of one devoted user,

Kevin Wright

More information about the R-devel mailing list