[R] Dynamic Dictionary Data Type?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jun 2 20:40:35 CEST 2005
On Thu, 2 Jun 2005, Duncan Murdoch wrote:
> Gabor Grothendieck wrote:
>> On 6/2/05, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>>
>>> On Thu, 2 Jun 2005, hadley wickham wrote:
>>>
>>>
>>>>> An environment is a hash table, and copying occurs only on modification,
>>>>> when any language would have to copy in this context.
>>>
>>> Caveat: the default is new.env(hash=FALSE), so an environment is a hash
>>> table in one sense but not necessarily so in the strict sense.
>>
>>
>> Can you expand on this? When would one use hash = TRUE vs.
>> the default?
>
> It's an optimization question. hash = TRUE uses more memory and takes longer
> to set up, but will give faster searches in big environments. The current
> default assumes that environments are small so the effort of building the
> hash table is not worthwhile, and it does linear searches.
It's not really size: building small hash tables is quick. The issue is
more to do with whether there are many lookups done compared to entries.
We met the same issues for a named vector a while back. The relevant NEWS
item was
o Indexing a vector by a character vector was slow if both the
vector and index were long (say 10,000). Now hashing is used
and the time should be linear in the longer of the lengths
(but more memory is used).
> I suspect that we might have the default wrong (or perhaps should make the
> transition from linear to hash search automatically at a certain threshold
> size), but I haven't done the testing necessary to show this.
Here's an example
tr <- as.character(trunc(runif(1e5, max=100)))
system.time({
env <- new.env(hash=F)
for(i in 1:1e5) assign(tr[i], i, envir=env)
}, gcFirst=TRUE)
which takes about 5% less with hashing. Now change to max=1e4: the hashed
version takes about 50% longer, the unhashed one 120x.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list