[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes
Martin Maechler
maechler at stat.math.ethz.ch
Fri Mar 23 18:31:35 CET 2007
Thank you, Luke, for your feedback,
(more inline below)
>>>>> "Luke" == Luke Tierney <luke at stat.uiowa.edu>
>>>>> on Tue, 20 Mar 2007 20:27:18 -0500 (CDT) writes:
Luke> On Tue, 20 Mar 2007, Martin Maechler wrote:
>> As some of you may have seen / heard in the past,
>> it is not possible to make cbind() and rbind() into proper S4
>> generic functions, since their first formal argument is '...'.
>> [ BTW: S3-methods for these of course only dispatch on the first
>> argument which is also not really satisfactory in the context
>> of many possible matrix classes.]
>>
>> For this reason, after quite some discussion on R-core (and
>> maybe a bit on R-devel) about the options, since R-2.2.0 we have
>> had S4 generic functions cbind2() and rbind2() (and default methods)
>> in R's "methods" which are a version of cbind() and
>> rbind() respectively for two arguments (x,y)
>> {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and
>> hence don't make sense to be used to construct column-names or
>> row-names for rbind(), respectively.}
>>
>> We have been defining methods for cbind2() and rbind2()
>> for the 'Matrix' classes in late summer 2005 as well. So far so
>> good.
>>
>> In addition, [see also help(cbind2) ],
>> we have defined cbind() and rbind() functions which recursively call
>> cbind2() and rbind2(), more or less following John Chambers
>> proposal of dealing with such "(...)" argument functions.
>> These new recursively defined cbind() / rbind() functions
>> however have typically remained invisible in the methods package
>> [you can see them via methods:::cbind or methods:::rbind ]
>> and have been ``activated'' --- replacing base::cbind / rbind ---
>> only via an explicit or implicit call to
>> methods:::bind_activation(TRUE)
>>
>> One reason I didn't dare to make them the default was that I
>> noticed they didn't behave identically to cbind() / rbind() in
>> all cases, though IIRC the rare difference was only in the dimnames
>> returned; further, being entirely written in R, and recursive,
>> they were slower than the mostly C-based fast cbind() / rbind()
>> functions.
>>
>> As some Bioconductor developers have recently found,
>> these versions of cbind() and rbind() that have been
>> automagically activated by loading the Matrix package
>> can have a detrimental effect in some extreme cases,
>> e.g. when using
>> do.call(cbind, list_of_length_1000)
>> because of the recursion and the many many calls to the S4
>> generic, each time searching for method dispatch ...
>> For the bioconductor applications and potentially for others using cbind() /
>> rbind() extensively, this can lead to unacceptable performance
>> loss just because loading 'Matrix' currently calls
>> methods:::bind_activation(TRUE)
Luke> The recursion part is potentially problematic because stack space
Luke> limitations will cause this to fail for "relatively" short
Luke> list_of_length_1000, but that should be easily curable by rewriting
Luke> methods:::cbind and methods:::rbind to use iteration rather than
Luke> recursion.
Yes, thank you, Luke, something I should have at least tried
doing but didn't get to.
Luke> This might also help a little with efficiency by avoiding
Luke> call overhead. It would be interesting to know how much of the
Luke> performance hit is dispatch overhead and how much closure call
Luke> overhead. If it's dispatch overhead then it may be worth figuring out
Luke> some way of handling this with internal dispatch at the C level (at
Luke> the cost of maintaining the C level stuff).
Luke> My initial reaction to scanning the methods:::cbind code is that it is
Luke> doing too much, at least too much R-level work, but I haven't thought
Luke> it through carefully.
I can well understand that reaction..
The code started up quite small and easily understandable ...
until I started trying to emulate the "standard" cbind() /
rbind() behavior, notably about automatic colnames / rownames
creation. When delving into that the code got the current
partial messyness....
>> For this reason, we plan to refrain from doing this activation
>> on loading of Matrix, but propose to
>>
>> 1) define and export
>> cBind <- methods:::cbind
>> rBind <- methods:::cbind
>>
>> also do this for R-2.5.0 so that other useRs / packages
>> can start cBind() / rBind() in their code when they want to
>> have something that can become properly object-oriented
Luke> In mackage methods?
yes, inside methods, such that in fact there cBind <- cbind
(plus namespace export) would be sufficient.
In the mean time I'm less sure if this is desirable;
but at least this would ``expose'' the code and have it used
and consequently hopefully made more efficient (and hopefully
even simplified).
>> Possibly --- and this is the big RFC (request for comments) ---
>>
>> 2) __ for 'Matrix' only __ also
>> define and export
>> cbind <- methods:::cbind
>> rbind <- methods:::cbind
>>
>> I currently see the possibilities of doing
>> either '1)'
>> or '1) and 2)'
>> or less likely '2) alone'
>> and like to get your feedback on this.
>> "1)" alone would have the considerable drawback for current
>> Matrix useRs that their code / scripts which has been using
>> cbind() and rbind() for "Matrix" (and "matrix" and "numeric")
>> objects no longer works, but needs to be changed to use
>> rBind() and cBind() *instead*
>>
>> As soon as "2)" is done (in conjunction with "1)" or not),
>> those who need a very fast but non-OO version of cbind() / rbind()
>> need to call base::cbind() or base::rbind() explicitly.
>> This however would not be necessary for packages with a NAMESPACE
>> since these import 'base' automagically and hence would use
>> base::cbind() automagically {unless they also import(Matrix)}.
>>
>> We are quite interested in your feedback!
Luke> Either one seems cleaner to me than having loading of one package
Luke> result in mucking about in the internals of another.
I agree {{the reason I had chosen the unclean approach was the
hope that methods:::cbind could be improved to soon replace
base::cbind -- and so "Matrix" would just do something
that would happen more universally in the future anyway}}
Luke> If we are thinking of these as long term solutions then I think having
Luke> different names is cleaner, so 1) but not 2). If we are thinking of
Luke> this as a transition towards making base::cbind and base::rbind
Luke> support S4 dispatch via cbind2/rbind2 (assuming this can be done
Luke> efficiently) then there may be some merit to 2) to minimize the need
Luke> for code rewriting.
Yes, exactly.
My originally intent was strongly in the direction of making
base::cbind support S4 via cbind2() and rbind2() -- and one will
continue to be able to experiment with that after calling
methods:::bind_activation(TRUE).
The problem I had underestimated is
Luke> assuming this can be done efficiently
Luke> It might be worth experimenting with having .Internal(cbind(...))
Luke> check its arguments and call methods:::cbind if (Methods is loaded
Luke> and) any of the arguments are S4 -- as the S4 property is now cheap to
Luke> determine that may be very low cost especially if done after the
Luke> object bits have been checked with positive result.
Very nice idea ... currently I don't have the time to do it, so,
unless another R-core member (or an avid R-devel reader) jumps
in, I don't see this possible for R 2.5.0 (and hence the 2.5.x
series).
Too bad that nobody else (on R-devel) seems interested enough.
In the mean time, I'm proposing to implement '2)'
which will give
>> > library(Matrix)
>> Loading required package: lattice
>>
>> Attaching package: 'Matrix'
>>
>>
>> The following object(s) are masked from package:base :
>>
>> cbind,
>> rbind
every time Matrix is loaded, but that seems appropriate to me.
The question remains if it wasn't worth to do '1)' as well
{not in Matrix, but the 'methods' package} such that people can
more easily get experience with such a cbind2/rbind2 - based
version of cbind/rbind.
Martin
Luke> Best,
Luke> luke
>> Martin Maechler and Doug Bates <Matrix-authors at R-project.org>
Luke> --
Luke> Luke Tierney
Luke> Chair, Statistics and Actuarial Science
Luke> Ralph E. Wareham Professor of Mathematical Sciences
Luke> University of Iowa Phone: 319-335-3386
Luke> Department of Statistics and Fax: 319-335-3017
Luke> Actuarial Science
Luke> 241 Schaeffer Hall email: luke at stat.uiowa.edu
Luke> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list