problem with sub in graphs

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Mon, 22 Jun 1998 10:43:49 +0200


>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:

    PD> Colin Farrow <C.Farrow@geology.gla.ac.uk> writes:
    >> I understand the problem now.  Clearly a more systematic approach to
    >> the use of ... is required.  In the current example with plot and
    >> hist the problem arises because parameters of the title function are
    >> being passed to other low level graphical functions.  The solution
    >> therefore is that only global graphical parameters such as col, are
    >> passed via ... and function specific parameters like sub need to be
    >> specified in the function definition. Hence in the current situation
    >> plot.default and hist.default require a sub= argument which should
    >> be passed only to the title function. It currently is not, and yes
    >> it would be handy to be able to turn off the warnings.

    PD> Wait a minute.

    PD> Am I the only one sensing that something is badly amiss with that
    PD> logic? I agree that that is the essence of the problem, but the
    PD> solution seems wrong.

hmm, maybe not for the particular case.
In the light of my story below, I am convinced that code can be made less
error prone and more robust by using as little "..." as possible
inside basic R functions.

    PD> The purpose of '...' is that a function does not  need to know
    PD> about certain parameters that are being handled by lower level
    PD> routines. It can just pass them along.

Yes,  but......

The problem here is really about catching user typos.
S-plus has been  quite lazy here which has driven people into despair, 
more than once in my epxerience:

Consider the person (especially non-pro statistician) who does

	summary(glm(y~x,familiy=poisson(link="identity")))

and then tries to make sense from what [s]he gets.
	(This is a real story, not made up!  
	 I don't remember how many man hours were lost, but > 2)

In R, a more or less helpful error message ("... unused argument to function"),
whereas in S-plus, there's NO error message and a result that looks quite
reasonable  --- but is just wrong:
Since S has  glm <- function(..........,  ...)
				          ###
the  ``familiy'' is just put into the  "..." part,
	    ^
passed on to glm.fit and never seen there ---
resulting in the default family, i.e. gaussian, being used  which can give
quite reasonably looking results...  but the un-experienced user will
VERY hardly find out.

Note: The above behavior is not a basic merit of R of S(-plus),
      it just happens that glm() in R was written  WITHOUT "..."
      and glm() in S-plus has a "..."

My own lesson learned from the above story:

	1) Use "..." sparingly in functions you write
	2) Still use "<arg>=NULL" arguments which you pass to lower-level
	   functions in order only use "..." once, if possible.
	3) If you pass "..." to lower level functions,
	   they SHOULD be checking  ALL of "..." eventually
	   and tell if there were arguments there with ``wrong'' tags (/names).

	Otherwise, typos in argument names will not be detected and the
	arguments silently dropped.  
	I think this would be too lazy for ``robust'' programming..

    PD> This is very practical for modular programming in that you can just
    PD> (e.g.) add an option to a low level function without having to
    PD> register it with all other 'upstreams' functions (in different
    PD> packages and whatnot). Of course, you still have to be careful
    PD> about the argument names but that's fairly easy in practice.

    PD> Suppose that a function needs to call *two* such routines. Then why
    PD> should it suddenly be made responsible for keeping track of which
    PD> arguments make sense to which routine? I'd say that the only
    PD> sensible paradigm is that routines that get called with '...'
    PD> should simply disregard any arguments they do not understand what
    PD> to do with.

I agree that this is very handy in several cases.
However, in the light of my point "3)" above,
I think we should be more particular here.

Maybe, the lowest level functions  (R or C, I'm not so sure anymore),
should check the names / tags of "..."
and only silently disregard the names if the belong to a subset of
``registered'' names.

In the case of graphics and plot(...),  par(...);
this could be made in a well defined and accessible set.

    PD> (I'm also not very happy with the news that S-4 will be enforcing
    PD> identical calling sequences for generic methods, but that's another
    PD> matter)
yes, indeed  [2 x]

Martin Maechler <maechler@stat.math.ethz.ch>			<><
Seminar fuer Statistik, ETH-Zentrum SOL G1;	Sonneggstr.33
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1086
http://www.stat.math.ethz.ch/~maechler/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._