[Rd] names function for environments?
Michael Lawrence
lawrence.michael at gene.com
Thu Jan 29 15:07:14 CET 2015
On Thu, Jan 29, 2015 at 5:51 AM, Martin Maechler <
maechler at lynne.stat.math.ethz.ch> wrote:
> >>>>> Michael Lawrence <lawrence.michael at gene.com>
> >>>>> on Tue, 27 Jan 2015 07:59:59 -0800 writes:
>
> > Since the contract of ls() is to sort, there is nothing wrong with
> > programmers depending on it. And there are many functions that could
> be
> > made 60X faster, but is it worth it? But I did notice that
> > as.list.environment has a sorted=FALSE argument already, so I guess
> > identical(names(x), names(as.list(x))) could be made to be TRUE,
> assuming
> > the order is at least persistent, if undefined, so that is a nice
> property.
> > I guess I'm OK it with.
>
> As we ended only hearing "pro"s and no real "con"s,
> I've committed (a corrected version of) the code now.
>
> The above identity is not true in generality though, but
>
> identical(names(x), names(as.list(x, all.names=TRUE)))
>
> is now for an environment 'x'.
>
> One could think to change the default of 'all.names' in
> as.list.environment(.) from FALSE to TRUE,
> but that may break code in subtle places and I don't think we
> should go there.
>
>
Yea, but it is super weird that as.list() filters elements out of the
environment during coercion.
I think I'm going to look through the uses of ls() inside of the core
packages, because I suspect that is usually not necessary to extract the
keys, and there is a cleaner (and probably faster) way to achieve the same
result.
> Martin
>
>
> > On Tue, Jan 27, 2015 at 7:44 AM, Peter Haverty <
> haverty.peter at gene.com>
> > wrote:
>
> >> I think that the "sorted" and "all.names" arguments are really only
> >> appropriate for pretty printing to the screen. I think it is a bit
> >> unfortunate that environments have a names accessor that is 60X
> slower
> >> than all the other types. This is likely due to the history of
> >> environments, which were originally just for behind-the-scenes
> tasks.
> >>
> >> Now that users can use environments as hashes, we really need
> >> something like a "keys" function. We don't want programmers
> depending
> >> on the sorted-ness, as Martin mentioned. Also, I think it helps
> users
> >> when objects share as many of the key API functions as possible.
> >> "names" is natural. "ls" was certainly confusing for me when I
> >> started. Having to supply two additional arguments to get the
> desired
> >> output doesn't help there. Think of all the perl programmers
> >> struggling to switch to R. Let's help them out.
> >> Pete
> >>
> >> ____________________
> >> Peter M. Haverty, Ph.D.
> >> Genentech, Inc.
> >> phaverty at gene.com
> >>
> >>
> >> On Tue, Jan 27, 2015 at 7:26 AM, Michael Lawrence
> >> <lawrence.michael at gene.com> wrote:
> >> > I think ls(, sort=FALSE) would be more explicit and thus clearer.
> There
> >> is
> >> > much precedent for having arguments that request less work to be
> done
> >> e.g.
> >> > unlist(use.names=FALSE). Yes, the extra typing is a bit painful,
> but
> >> there
> >> > is no intuitive reason why names() would be unsorted, while ls()
> would be
> >> > sorted. While it is tempting to use an existing function for
> this, the
> >> word
> >> > "names" is somewhat loaded. For example, one might expect
> >> > identical(names(env), names(as.list(env))) to be TRUE. I see no
> problem
> >> with
> >> > making names() a simple alias of ls(), as long as the behavior is
> the
> >> same.
> >> > Maybe a different name would be less "loaded" and imply lack of
> order,
> >> > something like keySet(). But do we really need this?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler
> >> > <maechler at lynne.stat.math.ethz.ch> wrote:
> >> >>
> >> >> >>>>> Peter Haverty <haverty.peter at gene.com>
> >> >> >>>>> on Sun, 25 Jan 2015 12:21:04 -0800 writes:
> >> >>
> >> >> > Hi all,
> >> >> > The "ls" function wears two hats. It allows users to
> inspect an
> >> >> > environment interactively and also serves deeper in code
> as the
> >> >> > accessor for an environment's names/keys. I propose that we
> >> separate
> >> >> > these two conflicting goals, keeping ls for interactive
> use and
> >> >> adding
> >> >> > names for a quick listing of the hash keys. This involves
> adding
> >> two
> >> >> > lines to do_names in attrib.c.
> >> >>
> >> >> > The 'ls' function and its 'objects' synonym appear very
> frequently
> >> >> in
> >> >> > performance-critical code like base/R/namespace.R and
> throughout
> >> the
> >> >> > methods package. These functions are currently among the
> major
> >> >> > contributors to execution time in package loading.
> >> >>
> >> >> > This two-line addition to attrib.c gives a significant
> speedup for
> >> >> > listing an environment's names/keys (2-60X depending on the
> >> 'sorted'
> >> >> > argument). It also simplifies the environment API by
> making it
> >> more
> >> >> > like the other basic types. We already have $ and [[ after
> all.
> >> >>
> >> >> > Rather than sprinkling sorted=FALSE throughout the methods
> and
> >> base
> >> >> > code, let's use names.
> >> >>
> >> >> as for list()s and other (generalized) vectors.
> >> >>
> >> >> This sounds appealing at first, and I have heard/seen others
> propose
> >> >> it. I see one good reason *not* to allow it (and you mention the
> >> >> reason by mentioning 'sorted') :
> >> >>
> >> >> The contents of an environment are inherently unordered, and
> >> >> even if the order stays fixed for a while, no code should rely
> >> >> on the ordering of the objects, and for that reason,
> >> >> <env>[1] etc do not make sense and are not allowed.
> >> >>
> >> >> > Would you be open to this change?
> >> >>
> >> >> I'm undecided currently:
> >> >> "-": reason above;
> >> >> "+": convenience, compacter R code using it;
> >> >> very simple and natural change to src/main/attrib.c
> >> >>
> >> >> and waiting for other comments, not the least from other members
> of R
> >> core
> >> >> ..
> >> >>
> >> >> Martin Maechler, ETH Zurich
> >> >>
> >> >>
> >> >> > I have submitted a patch and some timings to the bug
> tracker as
> >> >> > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170
> >> >>
> >> >> > Regards,
> >> >> > Pete
> >> >>
> >> >> > ____________________
> >> >> > Peter M. Haverty, Ph.D.
> >> >> > Genentech, Inc.
> >> >> > phaverty at gene.com
> >> >>
> >> >> > ______________________________________________
> >> >> > R-devel at r-project.org mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >>
> >> >> ______________________________________________
> >> >> R-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >> >
> >>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list