[Bioc-devel] GenomicFiles reducer and iterate argument
Valerie Obenchain
vobencha at fhcrc.org
Wed Jun 18 22:58:17 CEST 2014
We'll try a single arg to REDUCER and see how it goes.
BTW I'm also going to swap out DataFrame for Vector in the rowData.
DataFrame has been more difficult than anticipated (storing names,
subsetting to get ranges out) and doesn't give any clear advantage over
Vector.
Val
On 06/17/2014 02:59 PM, Michael Lawrence wrote:
> I think there are two different use cases here. The first, the one that
> I think is driving the design, is that the user writes a function for a
> particular problem, where the value of iterate is known. The other use
> case is that the user gets a summary function from somewhere else (a
> package) and applies it using reduceBy*. In that case, the user would
> potentially need to write a wrapper, depending on the formals of the
> reusable function. The only way I could make the second use case work
> with the current design is to have a higher order function that returns
> a universal iterator that detects the value of iterate via nargs() and
> behaves appropriately. The higher order function would not need to be
> known to the user, just the package developer.
>
>
>
> On Tue, Jun 17, 2014 at 1:39 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> Val's out today and I'm at least part of the problem so...
>
>
> On 06/17/2014 10:13 AM, Michael Lawrence wrote:
>
> On Tue, Jun 17, 2014 at 7:00 AM, Valerie Obenchain
> <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>
> wrote:
>
> Hi Michael, Ryan,
>
> Yes, it would be ideal to have a single signature for both
> cases of
> 'iterate'. We went over the pros/cons again and at the end
> of the day
> decided to keep things as they are. No perfect solution here.
>
> These were the primary points:
>
> - Disadvantages of defining REDUCER with only '...' is that
> '...' can
> represent variables other than just the output from MAPPER.
>
>
> Do you mean that "..." will capture additional arguments? From
> where?
>
>
> reduceBy* takes an argument ... and this is currently available to
> both the MAPPER and REDUCER, see below.
>
>
>
>
> - The unappealing aspect of the variadic approach is
> introducing a new
> check each time REDUCER is called.
>
>
> What is this check?
>
>
> - Going the other direction, considering a single arg for
> REDUCER instead
> two, requires coercing 'last' and 'current' to a list before
> pulling them
> apart again.
>
>
> What is the problem with constructing this list? Isn't that one
> extremely
> fast line of code?
>
>
> it's not the list construction but the lost convenience of named
> arguments, in addition to consistency with Reduce when the data are
> presented iteratively -- REDUCER=`+` instead of
> REDUCER=function(lst) sum(unlist(lst, use.names=FALSE)).
>
>
>
> It seems to me simpler to settle on one signature, and my
> preference would
> be for the single list argument, just because the call is
> smaller and
> simpler. Then have a convenient adaptor to handle the variadic case.
>
>
> The variadic adapter concept is easy enough to understand in
> context, but would send me for a head scratch at some later time.
>
> Martin
>
>
>
>
>
> Valerie
>
>
>
> On 06/15/14 16:36, Michael Lawrence wrote:
>
> I kind of prefer the adaptor solution, just for the sake
> of API
> cleanliness
> (the MAPPER/REDUCER pair has some elegance), but I think
> we agree that the
> iterate switch introduces undesirable coupling.
>
>
>
>
> On Sun, Jun 15, 2014 at 3:07 PM, Ryan
> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>
> What about having two separate reducer arguments, one
> for a reducer that
>
> takes two elements at a time and combines them, and
> the other for a
> reducer
> that takes a list and combines all the elements of
> the list? Specifying
> both at once would be an error. I think it makes
> more sense to say "these
> two arguments expect different things" than "this
> one argument expects a
> different thing depending on the value of another
> argument".
>
> -Ryan
>
>
> On Sun Jun 15 11:17:59 2014, Michael Lawrence wrote:
>
> I just thought there is some benefit for the
> callback to be the same,
>
> regardless of the iterate setting. This would
> allow generalization
> across
> different data scales. Perhaps all that is
> needed is a constructor for
> an
> adapter closure, one for each direction.
>
> For example, the variadic adapter would look like:
>
> Variadic <- function(FUN) {
> function(x, y) {
> if (missing(y)) {
> do.call(FUN, x)
> } else {
> FUN(x, y)
> }
> }
> }
>
> That would make it easy to e.g. adapt rbind into
> the framework. I wonder
> if
> there is precedent and better terminology from
> the functional
> programming
> domain?
>
> Michael
>
>
>
> On Sun, Jun 15, 2014 at 8:38 AM, Martin Morgan
> <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
> wrote:
>
> On 06/15/2014 07:34 AM, Michael Lawrence wrote:
>
>
> Hi guys,
>
>
> Was just checking out GenomicFiles and
> was a little surprised that the
> arguments to the REDUCER are different
> depending on iterate=TRUE vs.
> iterate=FALSE. In my often flawed
> opinion, iteration should not be a
> concern of the REDUCER. It should be
> oblivious to the iteration mode.
> In
> other words, when iterate=TRUE, it is a
> special case of having two
> objects
> to combine, instead of multiple.
>
>
> My 'rationale' was that one would
> choose iterate=FALSE when one
>
> required
> all elements to perform the reduction. I
> thought of the list (rather
> than
> ...) as the general R data structure for
> representing N elements, with
> a
> special case (consistent with Reduce) made
> for the pairwise reduction
> of
> iterate=TRUE. Either way, the two cases (x,
> y vs. list(), x, y vs. ...)
> seem to require some explaining to the user.
> Is there a clear better
> choice? You're the second person to trip
> over this, so I guess there's
> a
> crack in the sidewalk...
>
> Martin
>
>
> What would be convenient (but
> unnecessary) is to detect from the
> formal
>
> arguments whether REDUCER is variadic or
> list-based. In other words,
> if
> REDUCER is defined like function(...) {
> } it is called via do.call(),
> otherwise it is passed the list.
>
> Thoughts? Maybe I'm totally confused?
>
> Michael
>
> [[alternative HTML version
> deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
> --
>
> Computational Biology / Fred Hutchinson
> Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
> [[alternative HTML version deleted]]
>
>
> _________________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
More information about the Bioc-devel
mailing list