[R] Reduce woes
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Fri Jul 29 23:06:40 CEST 2016
Having experienced some frustration myself when I first started with R many years ago, I can relate to your apparent frustration. However, if you would like to succeed in using R I strongly recommend learning R and not trying to write Haskell or Erlang or C or Fortran or any other language when writing in R. I am sure there are many things R could do better, and once you understand how R actually works you might even be in a position to contribute some improvements. But thinking in those other languages with an R interpreter on front of you is going to just make you more frustrated.
For one thing, everything in R is a vector... even lists. Appending to a list is not O(1) as it would be for a linked list. Thus it is preferred to find algorithms that pre-allocate memory for results. Map (lapply) is 1:1 to encourage that. Reduce is N:1 because it is simpler that way. Use Map to make a grouping vector that you can use to select which elements you want to process and then map over that subset of your input data or aggregate over the whole thing.
Also, names are attributes of the list vector... one name per element. Not all list operations maintain that attribute so you often have to explicitly copy names from source to destination.
Oh and "source" is a common base R function... and so it is generally advised to not re-use common names in the global environment.
--
Sent from my phone. Please excuse my brevity.
On July 29, 2016 8:43:16 AM PDT, Stefan Kruger <stefan.kruger at gmail.com> wrote:
>>> I still don't understand why you want Reduce to to lapply's
>>> job. Reduce maps many to one and lapply maps many to
>>> many.
>
>Say you want to map a function over a subset of a vector or list? With
>the
>generalised version of Reduce you map many-to-one, but the one can be a
>'complex' structure. lapply() and friends not only map many-to-many,
>but
>X-to-X - the resulting list will be the same length as the source. This
>frequently gets used in Elixir, Erlang, Haskell etc as a means of
>processing a pipeline or stream - start with a vector, select a subset
>based on some predicate, turn this subset into an entirely different
>object/list/
>
>In iterative-fashion pseudo code
>
>source = list(c(1,2,3,4), c(8,7,6,5,4,3,7), c(5,4))
>result = { }
>foreach (item in source) {
> if (length(item) > 2) {
> result[generate_some_name()] = length(item)
> }
>}
>
>That's and example of what I want to do. It maps many (a subset of the
>vectors in source) to one (the result named list). It's a map-filter -
>but
>even more general than your typical map-filter in that you can change
>the
>data structure - e.g. map a function over a vector, use a subset of the
>results, and turn those into a list or S3 object.
>
>
>Stefan
>
>
>
>On 29 July 2016 at 15:54, William Dunlap <wdunlap at tibco.com> wrote:
>
>> Reduce (like lapply) apparently uses the [[ operator to
>> extract components from the list given to it. X[[i]] does
>> not attach names(X)[i] to its output (where would it put it?).
>> Hence your se
>>
>> To help understand what these functions are doing try
>> putting print statements in your test functions:
>> > data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
>> > r <- Reduce(function(acc, item) { cat("acc="); str(acc) ;
>cat("item=");
>> str(item); length(item) }, data, init=list())
>> acc= list()
>> item= num [1:2] 1 1
>> acc= int 2
>> item= num 3
>> acc= int 1
>> item= num [1:2] 2 2
>> > data2 <- list(one = c(oneA=1, onB=1), three = c(threeA=3), two =
>> c(twoA=2, twoB=2))
>> > r <- Reduce(function(acc, item) { cat("acc="); str(acc) ;
>cat("item=");
>> str(item); length(item) }, data2, init=list())
>> acc= list()
>> item= Named num [1:2] 1 1
>> - attr(*, "names")= chr [1:2] "oneA" "onB"
>> acc= int 2
>> item= Named num 3
>> - attr(*, "names")= chr "threeA"
>> acc= int 1
>> item= Named num [1:2] 2 2
>> - attr(*, "names")= chr [1:2] "twoA" "twoB"
>>
>>
>> I still don't understand why you want Reduce to to lapply's
>> job. Reduce maps many to one and lapply maps many to
>> many.
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Jul 29, 2016 at 1:37 AM, Stefan Kruger
><stefan.kruger at gmail.com>
>> wrote:
>>
>>> Jeremiah -
>>>
>>> neat - that's one step closer, but one small thing I still don't
>>> understand:
>>>
>>> > data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
>>> > r = Reduce(function(acc, item) { append(acc,
>setNames(length(item),
>>> names(item))) }, data, list())
>>> > str(r)
>>> List of 3
>>> $ : int 2
>>> $ : int 1
>>> $ : int 2
>>>
>>> I wanted the names to remain, but it seems like the "data" parameter
>loses
>>> its names when consumed by the Reduce()? If I print "item" inside
>the
>>> reducing function, it's not got the names. I'm probably missing some
>>> central tenet of R here.
>>>
>>> As to your comment of this being lapply() implemented by Reduce() -
>as I
>>> understand lapply() (or map() in other functional languages), it's
>>> limited
>>> to returning a list/vector of the same length as the original.
>Consider
>>> this contrived example:
>>>
>>> > r = Reduce(function(acc, item) { if (length(item) > 1)
>{append(acc,
>>> setNames(length(item), names(item)))} }, data, list())
>>> > str(r)
>>> int 2
>>> > r
>>> [1] 2
>>>
>>> I don't think you could achieve that with lapply()?
>>>
>>> Thanks
>>>
>>> Stefan
>>>
>>>
>>> On 28 July 2016 at 20:19, jeremiah rounds <roundsjeremiah at gmail.com>
>>> wrote:
>>>
>>> > Basically using Reduce as an lapply in that example, but I think
>that
>>> was
>>> > caused by how people started talking about things in the first
>place =)
>>> But
>>> > the point is the accumulator can be anything as far as I can tell.
>>> >
>>> > On Thu, Jul 28, 2016 at 12:14 PM, jeremiah rounds <
>>> > roundsjeremiah at gmail.com> wrote:
>>> >
>>> >> Re:
>>> >> "What I'm trying to
>>> >> work out is how to have the accumulator in Reduce not be the same
>type
>>> as
>>> >> the elements of the vector/list being reduced - ideally it could
>be an
>>> S3
>>> >> instance, list, vector, or data frame."
>>> >>
>>> >> Pretty sure that is not true. See code that follows. I would
>never
>>> >> solve this task in this way though so no comment on the use of
>Reduce
>>> for
>>> >> what you described. (Note the accumulation of "functions" in a
>list is
>>> >> just a demo of possibilities). You could accumulate in an
>environment
>>> too
>>> >> and potentially gain a lot of copy efficiency.
>>> >>
>>> >>
>>> >> lookup = list()
>>> >> lookup[[as.character(1)]] = function() print("1")
>>> >> lookup[[as.character(2)]] = function() print("2")
>>> >> lookup[[as.character(3)]] = function() print("3")
>>> >>
>>> >> data = list(c(1,2), c(1,4), c(3,3), c(2,30))
>>> >>
>>> >>
>>> >> r = Reduce(function(acc, item) {
>>> >> append(acc, list(lookup[[as.character(min(item))]]))
>>> >> }, data,list())
>>> >> r
>>> >> for(f in r) f()
>>> >>
>>> >>
>>> >> On Thu, Jul 28, 2016 at 5:09 AM, Stefan Kruger <
>>> stefan.kruger at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Ulrik - many thanks for your reply.
>>> >>>
>>> >>> I'm aware of many simple solutions as the one you suggest, both
>>> iterative
>>> >>> and functional style - but I'm trying to learn how to bend
>Reduce()
>>> for
>>> >>> the
>>> >>> purpose of using it in more complex processing tasks. What I'm
>trying
>>> to
>>> >>> work out is how to have the accumulator in Reduce not be the
>same
>>> type as
>>> >>> the elements of the vector/list being reduced - ideally it could
>be
>>> an S3
>>> >>> instance, list, vector, or data frame.
>>> >>>
>>> >>> Here's a more realistic example (in Elixir, sorry)
>>> >>>
>>> >>> Given two lists:
>>> >>>
>>> >>> 1. data: maps an id string to a vector of revision strings
>>> >>> 2. dict: maps known id/revision pairs as a string to true (or 1)
>>> >>>
>>> >>> find the items in data not already in dict, returned as a named
>list.
>>> >>>
>>> >>> ```elixir
>>> >>> data = %{
>>> >>> "id1" => ["rev1.1", "rev1.2"],
>>> >>> "id2" => ["rev2.1"],
>>> >>> "id3" => ["rev3.1", "rev3.2", "rev3.3"]
>>> >>> }
>>> >>>
>>> >>> dict = %{
>>> >>> "id1/rev1.1" => 1,
>>> >>> "id1/rev1.2" => 1,
>>> >>> "id3/rev3.1" => 1
>>> >>> }
>>> >>>
>>> >>> # Find the items in data not already in dict. Return as a
>grouped map
>>> >>>
>>> >>> Map.keys(data)
>>> >>> |> Enum.flat_map(fn id -> Enum.map(data[id], fn rev -> {id,
>rev}
>>> end)
>>> >>> end)
>>> >>> |> Enum.filter(fn {id, rev} -> !Dict.has_key?(dict,
>>> "#{id}/#{rev}")
>>> >>> end)
>>> >>> |> Enum.reduce(%{}, fn ({k, v}, d) -> Map.update(d, k, [v],
>>> &[v|&1])
>>> >>> end)
>>> >>> ```
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On 28 July 2016 at 12:03, Ulrik Stervbo
><ulrik.stervbo at gmail.com>
>>> wrote:
>>> >>>
>>> >>> > Hi Stefan,
>>> >>> >
>>> >>> > in that case,lapply(data, length) should do the trick.
>>> >>> >
>>> >>> > Best wishes,
>>> >>> > Ulrik
>>> >>> >
>>> >>> > On Thu, 28 Jul 2016 at 12:57 Stefan Kruger
><stefan.kruger at gmail.com
>>> >
>>> >>> > wrote:
>>> >>> >
>>> >>> >> David - many thanks for your response.
>>> >>> >>
>>> >>> >> What I tried to do was to turn
>>> >>> >>
>>> >>> >> data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
>>> >>> >>
>>> >>> >> into
>>> >>> >>
>>> >>> >> result <- list(one = 2, three = 1, two = 2)
>>> >>> >>
>>> >>> >> that is creating a new list which has the same names as the
>first,
>>> but
>>> >>> >> where the values are the vector lengths.
>>> >>> >>
>>> >>> >> I know there are many other (and better) trivial ways of
>achieving
>>> >>> this -
>>> >>> >> my aim is less the task itself, and more figuring out if this
>can
>>> be
>>> >>> done
>>> >>> >> using Reduce() in the fashion I showed in the other examples
>I
>>> gave.
>>> >>> It's
>>> >>> >> a
>>> >>> >> building block of doing map-filter-reduce type pipelines that
>I'd
>>> >>> like to
>>> >>> >> understand how to do in R.
>>> >>> >>
>>> >>> >> Fumbling in the dark, I tried:
>>> >>> >>
>>> >>> >> Reduce(function(acc, item) { setNames(c(acc,
>length(data[item])),
>>> >>> item },
>>> >>> >> names(data), accumulate=TRUE)
>>> >>> >>
>>> >>> >> but setNames sets all the names, not adding one - and acc is
>still
>>> a
>>> >>> >> vector, not a list.
>>> >>> >>
>>> >>> >> It looks like 'lambda.tools.fold()' and possibly
>'purrr.reduce()'
>>> aim
>>> >>> at
>>> >>> >> doing what I'd like to do - but I've not been able to figure
>out
>>> quite
>>> >>> >> how.
>>> >>> >>
>>> >>> >> Thanks
>>> >>> >>
>>> >>> >> Stefan
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> On 27 July 2016 at 20:35, David Winsemius
><dwinsemius at comcast.net>
>>> >>> wrote:
>>> >>> >>
>>> >>> >> >
>>> >>> >> > > On Jul 27, 2016, at 8:20 AM, Stefan Kruger <
>>> >>> stefan.kruger at gmail.com>
>>> >>> >> > wrote:
>>> >>> >> > >
>>> >>> >> > > Hi -
>>> >>> >> > >
>>> >>> >> > > I'm new to R.
>>> >>> >> > >
>>> >>> >> > > In other functional languages I'm familiar with you can
>often
>>> >>> seed a
>>> >>> >> call
>>> >>> >> > > to reduce() with a custom accumulator. Here's an example
>in
>>> >>> Elixir:
>>> >>> >> > >
>>> >>> >> > > map = %{"one" => [1, 1], "three" => [3], "two" => [2, 2]}
>>> >>> >> > > map |> Enum.reduce(%{}, fn ({k,v}, acc) ->
>Map.update(acc, k,
>>> >>> >> > > Enum.count(v), nil) end)
>>> >>> >> > > # %{"one" => 2, "three" => 1, "two" => 2}
>>> >>> >> > >
>>> >>> >> > > In R-terms that's reducing a list of vectors to become a
>new
>>> list
>>> >>> >> mapping
>>> >>> >> > > the names to the vector lengths.
>>> >>> >> > >
>>> >>> >> > > Even in JavaScript, you can do similar things:
>>> >>> >> > >
>>> >>> >> > > list = { one: [1, 1], three: [3], two: [2, 2] };
>>> >>> >> > > var result = Object.keys(list).reduceRight(function (acc,
>>> item) {
>>> >>> >> > > acc[item] = list[item].length;
>>> >>> >> > > return acc;
>>> >>> >> > > }, {});
>>> >>> >> > > // result == { two: 2, three: 1, one: 2 }
>>> >>> >> > >
>>> >>> >> > > In R, from what I can gather, Reduce() is restricted such
>that
>>> any
>>> >>> >> init
>>> >>> >> > > value you feed it is required to be of the same type as
>the
>>> >>> elements
>>> >>> >> of
>>> >>> >> > the
>>> >>> >> > > vector you're reducing -- so I can't build up. So whilst
>I can
>>> >>> do, say
>>> >>> >> > >
>>> >>> >> > >> Reduce(function(acc, item) { acc + item }, c(1,2,3,4,5),
>96)
>>> >>> >> > > [1] 111
>>> >>> >> > >
>>> >>> >> > > I can't use Reduce to build up a list, vector or data
>frame?
>>> >>> >> > >
>>> >>> >> > > What am I missing?
>>> >>> >> > >
>>> >>> >> > > Many thanks for any pointers,
>>> >>> >> >
>>> >>> >> > This builds a list:
>>> >>> >> >
>>> >>> >> > > Reduce(function(acc, item) { c(acc , item) },
>c(1,2,3,4,5), 96,
>>> >>> >> > accumulate=TRUE)
>>> >>> >> > [[1]]
>>> >>> >> > [1] 96
>>> >>> >> >
>>> >>> >> > [[2]]
>>> >>> >> > [1] 96 1
>>> >>> >> >
>>> >>> >> > [[3]]
>>> >>> >> > [1] 96 1 2
>>> >>> >> >
>>> >>> >> > [[4]]
>>> >>> >> > [1] 96 1 2 3
>>> >>> >> >
>>> >>> >> > [[5]]
>>> >>> >> > [1] 96 1 2 3 4
>>> >>> >> >
>>> >>> >> > [[6]]
>>> >>> >> > [1] 96 1 2 3 4 5
>>> >>> >> >
>>> >>> >> > But you are not saying what you want. The other examples
>were
>>> doing
>>> >>> >> > something with names but you provided no names for the R
>example.
>>> >>> >> >
>>> >>> >> > This would return a list of named vectors:
>>> >>> >> >
>>> >>> >> > > Reduce(function(acc, item) { setNames( c(acc,item),
>1:(item+1))
>>> >>> },
>>> >>> >> > c(1,2,3,4,5), 96, accumulate=TRUE)
>>> >>> >> > [[1]]
>>> >>> >> > [1] 96
>>> >>> >> >
>>> >>> >> > [[2]]
>>> >>> >> > 1 2
>>> >>> >> > 96 1
>>> >>> >> >
>>> >>> >> > [[3]]
>>> >>> >> > 1 2 3
>>> >>> >> > 96 1 2
>>> >>> >> >
>>> >>> >> > [[4]]
>>> >>> >> > 1 2 3 4
>>> >>> >> > 96 1 2 3
>>> >>> >> >
>>> >>> >> > [[5]]
>>> >>> >> > 1 2 3 4 5
>>> >>> >> > 96 1 2 3 4
>>> >>> >> >
>>> >>> >> > [[6]]
>>> >>> >> > 1 2 3 4 5 6
>>> >>> >> > 96 1 2 3 4 5
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > > Stefan
>>> >>> >> > >
>>> >>> >> > >
>>> >>> >> > >
>>> >>> >> > > --
>>> >>> >> > > Stefan Kruger <stefan.kruger at gmail.com>
>>> >>> >> > >
>>> >>> >> > > [[alternative HTML version deleted]]
>>> >>> >> > >
>>> >>> >> > > ______________________________________________
>>> >>> >> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>more,
>>> see
>>> >>> >> > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> >> > > PLEASE do read the posting guide
>>> >>> >> > http://www.R-project.org/posting-guide.html
>>> >>> >> > > and provide commented, minimal, self-contained,
>reproducible
>>> code.
>>> >>> >> >
>>> >>> >> > David Winsemius
>>> >>> >> > Alameda, CA, USA
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Stefan Kruger <stefan.kruger at gmail.com>
>>> >>> >>
>>> >>> >> [[alternative HTML version deleted]]
>>> >>> >>
>>> >>> >> ______________________________________________
>>> >>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>see
>>> >>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> >> PLEASE do read the posting guide
>>> >>> >> http://www.R-project.org/posting-guide.html
>>> >>> >> and provide commented, minimal, self-contained, reproducible
>code.
>>> >>> >>
>>> >>> >
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Stefan Kruger <stefan.kruger at gmail.com>
>>> >>>
>>> >>> [[alternative HTML version deleted]]
>>> >>>
>>> >>> ______________________________________________
>>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>see
>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> PLEASE do read the posting guide
>>> >>> http://www.R-project.org/posting-guide.html
>>> >>> and provide commented, minimal, self-contained, reproducible
>code.
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>> --
>>> Stefan Kruger <stefan.kruger at gmail.com>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
More information about the R-help
mailing list