[Rd] names function for environments?

Michael Lawrence lawrence.michael at gene.com
Thu Jan 29 15:07:14 CET 2015


On Thu, Jan 29, 2015 at 5:51 AM, Martin Maechler <
maechler at lynne.stat.math.ethz.ch> wrote:

> >>>>> Michael Lawrence <lawrence.michael at gene.com>
> >>>>>     on Tue, 27 Jan 2015 07:59:59 -0800 writes:
>
>     > Since the contract of ls() is to sort, there is nothing wrong with
>     > programmers depending on it. And there are many functions that could
> be
>     > made 60X faster, but is it worth it? But I did notice that
>     > as.list.environment has a sorted=FALSE argument already, so I guess
>     > identical(names(x), names(as.list(x))) could be made to be TRUE,
> assuming
>     > the order is at least persistent, if undefined, so that is a nice
> property.
>     > I guess I'm OK it with.
>
> As we ended only hearing "pro"s and no real "con"s,
> I've committed (a corrected version of) the code now.
>
> The above identity is not true in generality though,  but
>
>         identical(names(x), names(as.list(x, all.names=TRUE)))
>
> is now for an environment 'x'.
>
> One could think to change the default of 'all.names' in
> as.list.environment(.) from FALSE to TRUE,
> but that may break code in subtle places and I don't think we
> should go there.
>
>
Yea, but it is super weird that as.list() filters elements out of the
environment during coercion.

I think I'm going to look through the uses of ls() inside of the core
packages, because I suspect that is usually not necessary to extract the
keys, and there is a cleaner (and probably faster) way to achieve the same
result.


> Martin
>
>
>     > On Tue, Jan 27, 2015 at 7:44 AM, Peter Haverty <
> haverty.peter at gene.com>
>     > wrote:
>
>     >> I think that the "sorted" and "all.names" arguments are really only
>     >> appropriate for pretty printing to the screen. I think it is a bit
>     >> unfortunate that environments have a names accessor that is 60X
> slower
>     >> than all the other types. This is likely due to the history of
>     >> environments, which were originally just for behind-the-scenes
> tasks.
>     >>
>     >> Now that users can use environments as hashes, we really need
>     >> something like a "keys" function. We don't want programmers
> depending
>     >> on the sorted-ness, as Martin mentioned.  Also, I think it helps
> users
>     >> when objects share as many of the key API functions as possible.
>     >> "names" is natural. "ls" was certainly confusing for me when I
>     >> started. Having to supply two additional arguments to get the
> desired
>     >> output doesn't help there.  Think of all the perl programmers
>     >> struggling to switch to R.  Let's help them out.
>     >> Pete
>     >>
>     >> ____________________
>     >> Peter M. Haverty, Ph.D.
>     >> Genentech, Inc.
>     >> phaverty at gene.com
>     >>
>     >>
>     >> On Tue, Jan 27, 2015 at 7:26 AM, Michael Lawrence
>     >> <lawrence.michael at gene.com> wrote:
>     >> > I think ls(, sort=FALSE) would be more explicit and thus clearer.
> There
>     >> is
>     >> > much precedent for having arguments that request less work to be
> done
>     >> e.g.
>     >> > unlist(use.names=FALSE).  Yes, the extra typing is a bit painful,
> but
>     >> there
>     >> > is no intuitive reason why names() would be unsorted, while ls()
> would be
>     >> > sorted. While it is tempting to use an existing function for
> this, the
>     >> word
>     >> > "names" is somewhat loaded. For example, one might expect
>     >> > identical(names(env), names(as.list(env))) to be TRUE. I see no
> problem
>     >> with
>     >> > making names() a simple alias of ls(), as long as the behavior is
> the
>     >> same.
>     >> > Maybe a different name would be less "loaded" and imply lack of
> order,
>     >> > something like keySet(). But do we really need this?
>     >> >
>     >> >
>     >> >
>     >> >
>     >> >
>     >> >
>     >> > On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler
>     >> > <maechler at lynne.stat.math.ethz.ch> wrote:
>     >> >>
>     >> >> >>>>> Peter Haverty <haverty.peter at gene.com>
>     >> >> >>>>>     on Sun, 25 Jan 2015 12:21:04 -0800 writes:
>     >> >>
>     >> >>     > Hi all,
>     >> >>     > The "ls" function wears two hats. It allows users to
> inspect an
>     >> >>     > environment interactively and also serves deeper in code
> as the
>     >> >>     > accessor for an environment's names/keys. I propose that we
>     >> separate
>     >> >>     > these two conflicting goals, keeping ls for interactive
> use and
>     >> >> adding
>     >> >>     > names for a quick listing of the hash keys. This involves
> adding
>     >> two
>     >> >>     > lines to do_names in attrib.c.
>     >> >>
>     >> >>     > The 'ls' function and its 'objects' synonym appear very
> frequently
>     >> >> in
>     >> >>     > performance-critical code like base/R/namespace.R and
> throughout
>     >> the
>     >> >>     > methods package. These functions are currently among the
> major
>     >> >>     > contributors to execution time in package loading.
>     >> >>
>     >> >>     > This two-line addition to attrib.c gives a significant
> speedup for
>     >> >>     > listing an environment's names/keys (2-60X depending on the
>     >> 'sorted'
>     >> >>     > argument). It also simplifies the environment API by
> making it
>     >> more
>     >> >>     > like the other basic types. We already have $ and [[ after
> all.
>     >> >>
>     >> >>     > Rather than sprinkling sorted=FALSE throughout the methods
> and
>     >> base
>     >> >>     > code, let's use names.
>     >> >>
>     >> >> as for list()s and other (generalized) vectors.
>     >> >>
>     >> >> This sounds appealing at first, and I have heard/seen others
> propose
>     >> >> it.  I see one good reason *not* to allow it (and you mention the
>     >> >> reason by mentioning 'sorted') :
>     >> >>
>     >> >> The contents of an environment are inherently unordered, and
>     >> >> even if the order stays fixed for a while, no code should rely
>     >> >> on the ordering of the objects, and for that reason,
>     >> >>  <env>[1]  etc do not make sense and are not allowed.
>     >> >>
>     >> >>     > Would you be open to this change?
>     >> >>
>     >> >> I'm undecided currently:
>     >> >>  "-": reason above;
>     >> >>  "+": convenience, compacter R code using it;
>     >> >>       very simple and natural change to src/main/attrib.c
>     >> >>
>     >> >> and waiting for other comments, not the least from other members
> of R
>     >> core
>     >> >> ..
>     >> >>
>     >> >> Martin Maechler, ETH Zurich
>     >> >>
>     >> >>
>     >> >>     > I have submitted a patch and some timings to the bug
> tracker as
>     >> >>     > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170
>     >> >>
>     >> >>     > Regards,
>     >> >>     > Pete
>     >> >>
>     >> >>     > ____________________
>     >> >>     > Peter M. Haverty, Ph.D.
>     >> >>     > Genentech, Inc.
>     >> >>     > phaverty at gene.com
>     >> >>
>     >> >>     > ______________________________________________
>     >> >>     > R-devel at r-project.org mailing list
>     >> >>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >> >>
>     >> >> ______________________________________________
>     >> >> R-devel at r-project.org mailing list
>     >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >> >
>     >> >
>     >>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list