[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 20 23:19:54 CET 2007
As some of you may have seen / heard in the past,
it is not possible to make cbind() and rbind() into proper S4
generic functions, since their first formal argument is '...'.
[ BTW: S3-methods for these of course only dispatch on the first
argument which is also not really satisfactory in the context
of many possible matrix classes.]
For this reason, after quite some discussion on R-core (and
maybe a bit on R-devel) about the options, since R-2.2.0 we have
had S4 generic functions cbind2() and rbind2() (and default methods)
in R's "methods" which are a version of cbind() and
rbind() respectively for two arguments (x,y)
{and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and
hence don't make sense to be used to construct column-names or
row-names for rbind(), respectively.}
We have been defining methods for cbind2() and rbind2()
for the 'Matrix' classes in late summer 2005 as well. So far so
good.
In addition, [see also help(cbind2) ],
we have defined cbind() and rbind() functions which recursively call
cbind2() and rbind2(), more or less following John Chambers
proposal of dealing with such "(...)" argument functions.
These new recursively defined cbind() / rbind() functions
however have typically remained invisible in the methods package
[you can see them via methods:::cbind or methods:::rbind ]
and have been ``activated'' --- replacing base::cbind / rbind ---
only via an explicit or implicit call to
methods:::bind_activation(TRUE)
One reason I didn't dare to make them the default was that I
noticed they didn't behave identically to cbind() / rbind() in
all cases, though IIRC the rare difference was only in the dimnames
returned; further, being entirely written in R, and recursive,
they were slower than the mostly C-based fast cbind() / rbind()
functions.
As some Bioconductor developers have recently found,
these versions of cbind() and rbind() that have been
automagically activated by loading the Matrix package
can have a detrimental effect in some extreme cases,
e.g. when using
do.call(cbind, list_of_length_1000)
because of the recursion and the many many calls to the S4
generic, each time searching for method dispatch ...
For the bioconductor applications and potentially for others using cbind() /
rbind() extensively, this can lead to unacceptable performance
loss just because loading 'Matrix' currently calls
methods:::bind_activation(TRUE)
For this reason, we plan to refrain from doing this activation
on loading of Matrix, but propose to
1) define and export
cBind <- methods:::cbind
rBind <- methods:::cbind
also do this for R-2.5.0 so that other useRs / packages
can start cBind() / rBind() in their code when they want to
have something that can become properly object-oriented
Possibly --- and this is the big RFC (request for comments) ---
2) __ for 'Matrix' only __ also
define and export
cbind <- methods:::cbind
rbind <- methods:::cbind
I currently see the possibilities of doing
either '1)'
or '1) and 2)'
or less likely '2) alone'
and like to get your feedback on this.
"1)" alone would have the considerable drawback for current
Matrix useRs that their code / scripts which has been using
cbind() and rbind() for "Matrix" (and "matrix" and "numeric")
objects no longer works, but needs to be changed to use
rBind() and cBind() *instead*
As soon as "2)" is done (in conjunction with "1)" or not),
those who need a very fast but non-OO version of cbind() / rbind()
need to call base::cbind() or base::rbind() explicitly.
This however would not be necessary for packages with a NAMESPACE
since these import 'base' automagically and hence would use
base::cbind() automagically {unless they also import(Matrix)}.
We are quite interested in your feedback!
Martin Maechler and Doug Bates <Matrix-authors at R-project.org>
More information about the R-devel
mailing list