Documenting classes and methods: was [Rd] Re: R-devel Digest, Vol 3, Issue 23

Gordon Smyth smyth at wehi.edu.au
Sun Jun 8 03:59:39 MEST 2003


Many thanks for your helpful comments.

To over-simplify the previous discussion a bit, you are emphasising that 
all the aliases written by promptMethods will be needed by future versions 
of the help system, so it is important that they be included in .Rd files 
documenting methods. I was saying that these aliases are very verbose.

I am not the only one having trouble with the aliases. I have looked 
through all the R packages that I know of that use S4 methods and can't 
find even one example of a documented method which preserves all the 
promptMethod style aliases. This may be partly because package authors 
aren't sure what they should be doing, but I think it has to be taken as a 
pretty strong statement from authors that the aliases don't produce a 
workable result with the online help system as it currently stands. The 
trouble I think is with the html package contents page, which very quickly 
becomes too long and cluttered to be useful if all the aliases are included.

Could we satisfy both needs, (i) to have a unique help alias associated 
with each method and (ii) to have a package contents page which is 
readable, by giving authors control over which aliases are included on the 
contents page? Could we have a new command \aliasonly{} for the .Rd files 
for aliases which are to be available to the help system but not listed on 
the contents page? Authors could use \alias{} for aliases which are to be 
listed and \aliasonly{} for aliases which aren't.

It might also be helpful to have an optional argument, as in 
\alias[optional.text]{}, to choose the text which is listed on the package 
contents page.

Some other comments on generic functions are interpolated below.

Regards
Gordon

At 12:45 AM 27/05/2003, John Chambers wrote:
>Gordon Smyth wrote:
> >
> > I am another person who has had trouble documenting S4 classes and
> > (particularly) methods. The methods package itself is pretty cool by the
> > way, but it is a pity that there are as yet no guidelines on S4 in the
> > "Writing R Extensions" document.
> >
> > I have actually put together a guide on S4 documentation myself for the use
> > of my own lab which is at http://bioinf.wehi.edu.au/limma/Rdocs.html. I
> > don't pretend that the guide is perfect - I can already see problems with
> > it - but it has proved adequate so far for our own use (writing the limma
> > package) and has gained some more general acceptance from the Bioconductor
> > community.
> >
> > I found it hard to use the skeleton documentation provided by
> > promptMethods.
>
>The "structure" of the skeletons (the \alias lines especially) are
>intended to be used by the help system.  You're not meant to "use" these
>directly, much of the time.  It's the case that the tools to work with
>the .Rd structure haven't caught up yet, but please don't modify the
>skeleton's structure arbitrarily.
>
> > Suppose for example that I wish to document a method for
> > generic function 'foo' with argument list (x,y,...) for x of class 'bar1'
> > and y of class 'bar2':
> >
> > 1. The skeleton .Rd file contains \alias{foo-methods}. If two or more more
> > packages document methods for 'foo', they'll all have the same alias entry,
> > and the help that a user will get by typing ?"foo-methods" will depend on
> > which package happens to have been loaded most recently.
>
>Good point, but related to the behavior of "?".  It's related to a
>number of other issues about multiple packages referring to the same
>generic function.  Not likely to change for 1.7.1, but likely to be
>different in several ways in 1.8
>
> > 2. There seems to be no allowance for documenting extra named arguments for
> > this method which are not specified in the generic. There is no usage
> > entry, no argument list, and no process for R CMD check to check the
> > argument list against the definition of the method. In S3 one can write
> > \usage{\method{generic}{class}} and it would be nice to have an extension
> > of this facility for S4 methods. I have been abandoning the skeleton
> > structure produced by promptMethods and have been using \section{Usage} and
> > \section{Arguments}.
>
>Seems ok to have separate discussion of arguments, but don't "abandon"
>the rest of the material in the skeleton (see below).
>
>Heavy use of extra arguments in the methods is a little bit worrisome.
>There is an efficiency penalty, though not likely serious in sizable
>computations.  More basic (this is just my personal view), I like to
>think of the function as having a single conceptual definition--what it
>does and (by and large) what arguments it takes to describe what it
>should do.  Then the methods are the implementation.  The function
>description is likely what users, begining users particularly, want to
>see.  More advanced users and programmers may also be concerned with the
>implementation.
>
>So, most of the time, one would like the function to define the
>arguments, and the methods to work from these.
>
>In some examples of extra arguments (the S3 print() methods, for
>instance), these are style-setting parameters, or perhaps control
>parameters for numeric computations.  It might be clearer in such cases
>to say that "..." is always passed to a (class-dependant)
>parameter-setting function.  Documenting that function is then a
>separate step.
>
>Again, this is just by way of what may help users to understand the
>functions and help designers to write functions cleanly; not suggesting
>you should be forced to take this route.

The need for extra arguments seems to increase with the complexity of the 
task which the generic function does. In R base most use of generic 
functions are for language-type functions like 'print' or 'summary', 
although there are also data analysis functions like 'anova', 'residuals' 
and 'coefficients'. In Bioconductor we are tying to use S4 generic 
functions to undertake some quite complex data analysis talks, for example 
the 'normalization' of a mult-array microarray experiment. Normalization 
means to adjust the data for unwanted systematic trends due to 
technological sources other than the genes and the treatments of interest. 
The generic function 'normalize' encapsulates a single unifying concept, 
but the implementation may differ considerably depending on the type of 
microarrays being used and the factors will seem important to adjust for.

We can and probably should consider using parameter-setting functions. But 
if we end up with a separate parameter-setting function for every class, 
and most users need to read the documentation for these functions, then we 
haven't really achieved a simplification.

Another consideration which seems to increase the importance of the methods 
relative to the generic function itself is the modular package style of 
development being used in R. If I am the first author to use the function 
name 'normalize', do I have the right to specify all the arguments I need 
for this function as part of the generic definition, or should I minimize 
the specified arguments to give other authors as much flexibility as possible?

One example of document which I have been using as a model is the 
documentation in R base for the S3 generic function 'residuals'. The 
?residuals document explains the concept in generic terms and limits the 
argument list to 'object' and '...'. One can then read the separate 
method-documents ?residuals.lm or ?residuals.glm for other arguments. I 
would be happy if we could do similar with S4 functions and methods.

> > 3. The aliases for methods are pretty verbose and make the html contents
> > page for the package look rather cluttered. I have been deleting the
> > \alias{foo-methods} alias and been replacing \alias{foo,bar1,bar2-method}
> > with \alias{foo.bar1.bar2}. I know that using a syntactically valid name
> > for the alias has the potential problem that a function could actually
> > exist with that name, but I just like to use something shorter.
>
>Don't do that.  It's not what you like that counts, it's what works with
>the ? function, and your change will wipe out the ability of the help
>functions to identify correctly which method is being documented.
>
>For 1.8 (unfortunately, unlikely to be ironed out for 1.7.1), users
>should be able to get documentation on the method, say, for function
>f(x,y) corresponding to signature(x = "character", y = "numeric") by the
>expression
>   method ? f(x="character", y = "numeric")
>(or something along these lines).
>
>In any case, the \alias lines are crucial to going from any way of
>requesting method documentation to the correct documentation.
>
> > 4. There don't seem to be any guidelines for documenting a method with the
> > generic, if the generic happens to be defined in the same package, or with
> > the object class, if the generic dispatches on only one argument. I know
> > that you have thought about this, and in the document
> > http://developer.r-project.org/moreClassMethodIssues.html you refer to the
> > 'addTo' argument for 'promptMethods'. The 'addTo' argument however has not
> > yet been implemented in R.
> >
> > It would be nice to have a method for finding dynamically all available
> > documentation for methods for a given generic function. I wrote a little
> > prototype function called 'helpMethods' which simply extracts the list of
> > available methods and prompts the user for which help topic they'd like to
> > read. For this to work though, developers need to use a consistent alias
> > system for documenting methods. I haven't seen any package yet which is
> > using the aliases suggested by promptMethods.
> >
> > Do you think there is any value in my S4 documentation guide? Are there
> > errors or mis-understandings in it which should be corrected before it is
> > adopted as a guideline by Bioconductor?
>
>It's a useful document to have.  The whole area of documentation and
>online help is being worked on by a number of people, so there is the
>"moving target" difficulty.
>
>You mention in your document altering the output of the promptMethods
>skeleton.  Adding material, up to a point, is OK, but changing or
>deleting the "\" lines is not a good idea if you want the documentation
>to work with R's (evolving) help system.  As noted, the \alias lines
>should be left alone.
>
>There are a few other points we can discuss off-list, not directly
>related to this thread.
>
> > Are there major changes planned for the documentation system for S4 methods
> > and classes in R in the near future? Is it worth our while spending time
> > working out guidelines now or should we wait a bit until the situation
> > stabilizes?
>
>Commented on above--yes changes are in prospect.  Bioconductor may want
>to encourage documentation even before things settle down--really for
>the people in the project to assess whether guidelines are helpful at
>this point.
>
>As said, there will be some changes for 1.8, mostly additions to the
>code that processes the online help requests. It's a fairly good guess
>that the structure of \ lines, esp. the \alias lines, will be kept or
>extended, not radically changed, so keeping the current prompt output of
>these lines would be desirable.  If there are changes in the structure,
>you're more likely to see tools to modify what you have if you follow
>the current prompt output.
>
>In the longer run, it would be useful to have a documentation system
>based on a more modern form (e.g., XML), making possible more powerful
>online help software.  Duncan Temple Lang and others have done some good
>work on such systems.  My crystal ball is very foggy on what will happen
>with the R community in this direction.
>
>Regards,
>  John
>
> > Best wishes
> > Gordon
> >
> > >Date: Fri, 23 May 2003 15:37:50 -0400
> > >From: John Chambers <jmc at research.bell-labs.com>
> > >Subject: Re: [Rd] Documenting S4 classes; debugging them
> > >To: Duncan Murdoch <dmurdoch at pair.com>
> > >Cc: r-devel at stat.math.ethz.ch
> > >
> > >Duncan Murdoch wrote:
> > > >
> > > > 1.  I'm putting together my first package that uses S4 classes and
> > > > objects.  I'd like to document them, but I'm not sure what the
> > > > documentation should look like, and package.skeleton doesn't produce
> > > > any at all for the classes or methods.
> > >
> > >Hmm, sounds as if it should.
> > >
> > >Meanwhile, promptClass and promptMethods generate skeleton
> > >documentation.
> > >
> > > >
> > > > Are there any good examples to follow?
> > >
> > >The bioconductor packages (e.g, Biobase) have some examples.
> >
> > ...
> >
> > >John
> > >
> > > >
> > > > Duncan Murdoch
> > > >
> > > > ______________________________________________
> > > > R-devel at stat.math.ethz.ch mailing list
> > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> > >
> > >--
> > >John M. Chambers                  jmc at bell-labs.com
> > >Bell Labs, Lucent Technologies    office: (908)582-2681
> > >700 Mountain Avenue, Room 2C-282  fax:    (908)582-3340
> > >Murray Hill, NJ  07974            web: http://www.cs.bell-labs.com/~jmc
> >
> > 
> ---------------------------------------------------------------------------------------
> > Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics,
> > Walter and Eliza Hall Institute of Medical Research,
> > 1G Royal Parade, Parkville, Vic 3050, Australia
> > Tel: (03) 9345 2326, Fax (03) 9347 0852,
> > Email: smyth at wehi.edu.au, www: http://www.statsci.org
> >
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
>--
>John M. Chambers                  jmc at bell-labs.com
>Bell Labs, Lucent Technologies    office: (908)582-2681
>700 Mountain Avenue, Room 2C-282  fax:    (908)582-3340
>Murray Hill, NJ  07974            web: http://www.cs.bell-labs.com/~jmc



More information about the R-devel mailing list