[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes

Martin Maechler maechler at stat.math.ethz.ch
Tue Mar 20 23:19:54 CET 2007


As some of you may have seen / heard in the past,
it is not possible to make cbind() and rbind() into proper S4
generic functions, since their first formal argument is '...'.
[ BTW: S3-methods for these of course only dispatch on the first
  argument which is also not really satisfactory in the context
  of many possible matrix classes.]

For this reason, after quite some discussion on R-core (and
maybe a bit on R-devel) about the options, since R-2.2.0 we have
had S4 generic functions cbind2() and rbind2() (and default methods)
in R's "methods" which are a version of cbind() and
rbind() respectively for two arguments (x,y)
 {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and
  hence don't make sense to be used to construct column-names or
  row-names for rbind(), respectively.}

We have been defining methods for cbind2() and rbind2()
for the 'Matrix' classes in late summer 2005 as well.  So far so
good.

In addition, [see also  help(cbind2) ],
we have defined cbind() and rbind() functions which recursively call
cbind2() and rbind2(), more or less following John Chambers
proposal of dealing with such "(...)" argument functions.
These new recursively defined cbind() / rbind() functions
however have typically remained invisible in the methods package
[you can see them via  methods:::cbind  or  methods:::rbind ]
and have been ``activated'' --- replacing  base::cbind / rbind ---
only via an explicit or implicit call to
     methods:::bind_activation(TRUE)

One reason I didn't dare to make them the default was that I
noticed they didn't behave identically to cbind() / rbind() in
all cases, though IIRC the rare difference was only in the dimnames
returned; further, being entirely written in R, and recursive,
they were slower than the mostly C-based fast  cbind() / rbind()
functions.

As some Bioconductor developers have recently found,
these versions of cbind() and rbind() that have been
automagically activated by loading the  Matrix package
can have a detrimental effect in some extreme cases,
e.g. when using
     do.call(cbind, list_of_length_1000)
because of the recursion and the many many calls to the S4
generic, each time searching for method dispatch ...
For the bioconductor applications and potentially for others using cbind() /
rbind() extensively, this can lead to unacceptable performance
loss just because loading 'Matrix' currently calls
     methods:::bind_activation(TRUE)

For this reason, we plan to refrain from doing this activation
on loading of Matrix, but propose to

1)  define and export
	    cBind <- methods:::cbind
	    rBind <- methods:::cbind

    also do this for R-2.5.0 so that other useRs / packages
    can start cBind() / rBind() in their code when they want to
    have something that can become properly object-oriented

Possibly --- and this is the big  RFC (request for comments) ---

2) __ for 'Matrix' only __ also
   define and export
	    cbind <- methods:::cbind
	    rbind <- methods:::cbind

I currently see the possibilities of doing
 either '1)'
 or     '1) and 2)'
 or less likely  '2) alone'

and like to get your feedback on this.

"1)" alone would have the considerable drawback for current
  Matrix useRs that their code / scripts which has been using
  cbind() and rbind() for "Matrix" (and "matrix" and "numeric")
  objects no longer works, but needs to be changed to use
	rBind() and cBind()  *instead*

As soon as "2)" is done (in conjunction with "1)" or not),
those who need a very fast but non-OO version of cbind() / rbind()
need to call  base::cbind() or  base::rbind()  explicitly.
This however would not be necessary for packages with a NAMESPACE
since these import 'base' automagically and hence would use
base::cbind() automagically {unless they also import(Matrix)}.

We are quite interested in your feedback!

Martin Maechler and Doug Bates <Matrix-authors at R-project.org>



More information about the R-devel mailing list