[Bioc-devel] GenomicFiles reducer and iterate argument

Martin Morgan mtmorgan at fhcrc.org
Tue Jun 17 22:39:26 CEST 2014


Val's out today and I'm at least part of the problem so...

On 06/17/2014 10:13 AM, Michael Lawrence wrote:
> On Tue, Jun 17, 2014 at 7:00 AM, Valerie Obenchain <vobencha at fhcrc.org>
> wrote:
>
>> Hi Michael, Ryan,
>>
>> Yes, it would be ideal to have a single signature for both cases of
>> 'iterate'. We went over the pros/cons again and at the end of the day
>> decided to keep things as they are. No perfect solution here.
>>
>> These were the primary points:
>>
>> - Disadvantages of defining REDUCER with only '...' is that '...' can
>> represent variables other than just the output from MAPPER.
>>
>>
> Do you mean that "..." will capture additional arguments? From where?

reduceBy* takes an argument ... and this is currently available to both the 
MAPPER and REDUCER, see below.

>
>
>> - The unappealing aspect of the variadic approach is introducing a new
>> check each time REDUCER is called.
>>
>>
> What is this check?
>
>
>> - Going the other direction, considering a single arg for REDUCER instead
>> two, requires coercing 'last' and 'current' to a list before pulling them
>> apart again.
>>
>>
> What is the problem with constructing this list? Isn't that one extremely
> fast line of code?

it's not the list construction but the lost convenience of named arguments, in 
addition to consistency with Reduce when the data are presented iteratively -- 
REDUCER=`+` instead of REDUCER=function(lst) sum(unlist(lst, use.names=FALSE)).

>
> It seems to me simpler to settle on one signature, and my preference would
> be for the single list argument, just because the call is smaller and
> simpler. Then have a convenient adaptor to handle the variadic case.

The variadic adapter concept is easy enough to understand in context, but would 
send me for a head scratch at some later time.

Martin

>
>
>>
>> Valerie
>>
>>
>>
>> On 06/15/14 16:36, Michael Lawrence wrote:
>>
>>> I kind of prefer the adaptor solution, just for the sake of API
>>> cleanliness
>>> (the MAPPER/REDUCER pair has some elegance), but I think we agree that the
>>> iterate switch introduces undesirable coupling.
>>>
>>>
>>>
>>>
>>> On Sun, Jun 15, 2014 at 3:07 PM, Ryan <rct at thompsonclan.org> wrote:
>>>
>>>   What about having two separate reducer arguments, one for a reducer that
>>>> takes two elements at a time and combines them, and the other for a
>>>> reducer
>>>> that takes a list and combines all the elements of the list? Specifying
>>>> both at once would be an error. I think it makes more sense to say "these
>>>> two arguments expect different things" than "this one argument expects a
>>>> different thing depending on the value of another argument".
>>>>
>>>> -Ryan
>>>>
>>>>
>>>> On Sun Jun 15 11:17:59 2014, Michael Lawrence wrote:
>>>>
>>>>   I just thought there is some benefit for the callback to be the same,
>>>>> regardless of the iterate setting. This would allow generalization
>>>>> across
>>>>> different data scales. Perhaps all that is needed is a constructor for
>>>>> an
>>>>> adapter closure, one for each direction.
>>>>>
>>>>> For example, the variadic adapter would look like:
>>>>>
>>>>> Variadic <- function(FUN) {
>>>>>      function(x, y) {
>>>>>        if (missing(y)) {
>>>>>          do.call(FUN, x)
>>>>>        } else {
>>>>>          FUN(x, y)
>>>>>        }
>>>>>      }
>>>>> }
>>>>>
>>>>> That would make it easy to e.g. adapt rbind into the framework. I wonder
>>>>> if
>>>>> there is precedent and better terminology from the functional
>>>>> programming
>>>>> domain?
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Jun 15, 2014 at 8:38 AM, Martin Morgan <mtmorgan at fhcrc.org>
>>>>> wrote:
>>>>>
>>>>>    On 06/15/2014 07:34 AM, Michael Lawrence wrote:
>>>>>
>>>>>>
>>>>>>    Hi guys,
>>>>>>
>>>>>>>
>>>>>>> Was just checking out GenomicFiles and was a little surprised that the
>>>>>>> arguments to the REDUCER are different depending on iterate=TRUE vs.
>>>>>>> iterate=FALSE. In my often flawed opinion, iteration should not be a
>>>>>>> concern of the REDUCER. It should be oblivious to the iteration mode.
>>>>>>> In
>>>>>>> other words, when iterate=TRUE, it is a special case of having two
>>>>>>> objects
>>>>>>> to combine, instead of multiple.
>>>>>>>
>>>>>>>
>>>>>>>   My 'rationale' was that one would choose iterate=FALSE when one
>>>>>> required
>>>>>> all elements to perform the reduction. I thought of the list (rather
>>>>>> than
>>>>>> ...) as the general R data structure for representing N elements, with
>>>>>> a
>>>>>> special case (consistent with Reduce) made for the pairwise reduction
>>>>>> of
>>>>>> iterate=TRUE. Either way, the two cases (x, y vs. list(), x, y vs. ...)
>>>>>> seem to require some explaining to the user. Is there a clear better
>>>>>> choice? You're the second person to trip over this, so I guess there's
>>>>>> a
>>>>>> crack in the sidewalk...
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>    What would be convenient (but unnecessary) is to detect from the
>>>>>> formal
>>>>>>
>>>>>>> arguments whether REDUCER is variadic or list-based. In other words,
>>>>>>> if
>>>>>>> REDUCER is defined like function(...) { } it is called via do.call(),
>>>>>>> otherwise it is passed the list.
>>>>>>>
>>>>>>> Thoughts? Maybe I'm totally confused?
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>            [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N.
>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>
>>>>>> Location: Arnold Building M1 B861
>>>>>> Phone: (206) 667-2793
>>>>>>
>>>>>>
>>>>>>            [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list