[R] Is there a hash data structure for R
Jan van der Laan
rhe|p @end|ng |rom eoo@@dd@@n|
Wed Nov 3 10:47:06 CET 2021
On 03-11-2021 00:42, Avi Gross via R-help wrote:
>
> Finally, someone mentioned how creating a data.frame with duplicate names
> for columns is not a problem as it can automagically CHANGE them to be
> unique. That is a HUGE problem for using that as a dictionary as the new
> name will not be known to the system so all kinds of things will fail.
I think you are referring to my remark which was:
> However, the data.frame construction method will detect this and
> generate unique names (which also might not be what you want):
I didn't say this means that duplicate names aren't a problem; I just
mentioned the the behaviour is different. Personally, I would actually
prefer the behaviour of list (keep the duplicated name) with a warning.
Most of the responses seem to assume that the OP actually wants a hash
table. Yes, he did ask for that and for a hash table an environment
(with some work) would be a good option. But in many cases, where other
languages would use a hash-table-like object (such as a dict) in R you
would use other types of objects. Furthermore, for many operations where
you might use hash tables to implement the operation, R has already
built in options, for example %in%, match, duplicated. These are also
vectorised; so two vectors: one with keys and one with values might
actually be faster than an environment in some use cases.
Best,
Jan
>
> And there are also packages for many features like sets as well as functions
> to manipulate these things.
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bill Dunlap
> Sent: Tuesday, November 2, 2021 1:26 PM
> To: Andrew Simmons <akwsimmo using gmail.com>
> Cc: R Help <r-help using r-project.org>
> Subject: Re: [R] Is there a hash data structure for R
>
> Note that an environment carries a hash table with it, while a named list
> does not. I think that looking up an entry in a list causes a hash table to
> be created and thrown away. Here are some timings involving setting and
> getting various numbers of entries in environments and lists. The times are
> roughly linear in n for environments and quadratic for lists.
>
>> vapply(1e3 * 2 ^ (0:6), f, L=new.env(parent=emptyenv()),
> FUN.VALUE=NA_real_)
> [1] 0.00 0.00 0.00 0.02 0.03 0.06 0.15
>> vapply(1e3 * 2 ^ (0:6), f, L=list(), FUN.VALUE=NA_real_)
> [1] 0.01 0.03 0.15 0.53 2.66 13.66 56.05
>> f
> function(n, L, V = sprintf("V%07d", sample(n, replace=TRUE))) {
> system.time(for(v in V)L[[v]]<-c(L[[v]],v))["elapsed"] }
>
> Note that environments do not allow an element named "" (the empty string).
>
> Elements named NA_character_ are treated differently in environments and
> lists, neither of which is great. You may want your hash table functions to
> deal with oddball names explicitly.
>
> -Bill
>
> On Tue, Nov 2, 2021 at 8:52 AM Andrew Simmons <akwsimmo using gmail.com> wrote:
>
>> If you're thinking about using environments, I would suggest you
>> initialize them like
>>
>>
>> x <- new.env(parent = emptyenv())
>>
>>
>> Since environments have parent environments, it means that requesting
>> a value from that environment can actually return the value stored in
>> a parent environment (this isn't an issue for [[ or $, this is
>> exclusively an issue with assign, get, and exists) Or, if you've
>> already got your values stored in a list that you want to turn into an
>> environment:
>>
>>
>> x <- list2env(listOfValues, parent = emptyenv())
>>
>>
>> Hope this helps!
>>
>>
>> On Tue, Nov 2, 2021, 06:49 Yonghua Peng <yong using pobox.com> wrote:
>>
>>> But for data.frame the colnames can be duplicated. Am I right?
>>>
>>> Regards.
>>>
>>> On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan <rhelp using eoos.dds.nl>
>> wrote:
>>>
>>>>
>>>> True, but in a lot of cases where a python user might use a dict
>>>> an R user will probably use a list; or when we are talking about
>>>> arrays of dicts in python, the R solution will probably be a
>>>> data.frame (with
>> each
>>>> dict field in a separate column).
>>>>
>>>> Jan
>>>>
>>>>
>>>>
>>>>
>>>> On 02-11-2021 11:18, Eric Berger wrote:
>>>>> One choice is
>>>>> new.env(hash=TRUE)
>>>>> in the base package
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng <yong using pobox.com> wrote:
>>>>>
>>>>>> I know this is a newbie question. But how do I implement the
>>>>>> hash
>>>> structure
>>>>>> which is available in other languages (in python it's dict)?
>>>>>>
>>>>>> I know there is the list, but list's names can be duplicated here.
>>>>>>
>>>>>>> x <- list(x=1:5,y=month.name,x=3:7)
>>>>>>
>>>>>>> x
>>>>>>
>>>>>> $x
>>>>>>
>>>>>> [1] 1 2 3 4 5
>>>>>>
>>>>>>
>>>>>> $y
>>>>>>
>>>>>> [1] "January" "February" "March" "April" "May"
>>> "June"
>>>>>>
>>>>>> [7] "July" "August" "September" "October" "November"
>>>> "December"
>>>>>>
>>>>>>
>>>>>> $x
>>>>>>
>>>>>> [1] 3 4 5 6 7
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list