[R] Peculiar behavior of attached objects

Greg Hammett hammett at princeton.edu
Sun Aug 18 18:40:46 CEST 2002


Prof. Ripley, 

Thanks for your quick reply.  It was nice to hear an answer from one of
the experts in the field.  I agree that this behavior of R is as
documented, and found a good summary on p.29 of "An Introduction to R"
at

http://cran.r-project.org/doc/manuals/R-intro.pdf

When I had read the shorter online documentation, I had assumed that the
attach() command was like the "import" command of Python or the "use"
statement of Fortran90, which are very useful for avoiding name clashes
or allowing the usage of short names to refer to variables with longer
names in a package or module.  It still seems to me that requiring an R
user to detach() and attach() after each change of a dataframe variable
in order to be able to access it by a short name is somewhat awkward. 
Perhaps this is such an intrinsic part of the language that it would be
hard to extend, in which case the developers might consider adding a new
command which had more of the flexibility of the Python "import" command
or Fortran90 "use" statement.  (Of course it's easy for me to suggest
such a change, and much harder for someone to actually implement it...)

For example, in python, one can refer to variables by their fully
qualified module names:

>>> import math
>>> math.pi
3.1415926535897931

or by a shorter name defined inside a module:

>>> from math import *
>>> pi
3.1415926535897931

or if there is a name clash with another variable by the same name it
can be renamed:

>>> from math import pi as pim
>>> pim
3.1415926535897931

Similar things can be done with the Fortran90 "use" statement, which I
know does not create a new copy of a variable with separate memory
storage, which would be inefficient and awkward to use, it is just a new
name that points to the same location in memory.

More info at:

http://www.python.org/doc/current/tut/tut.html
http://www.python.org/doc/current/ref/import.html
http://w3.pppl.gov/~hammett/comp/f90tut/f90.tut7.html


As for my "explanation", I had assumed that it was only the action of
assigning to d$y while attach(d) was in effect that created a new
variable called y that was a copy of the original value of the dataframe
variable.  I guess you are telling me that a copy of the whole dataframe
is made every time an attach() command is issued.  I had assumed that
only new pointers with short names were created by an attach() command
(which is what the documentation seemed to imply to me), as it seems
inefficient to make a copy of a whole dataframe at every attach()
command, particularly for a very large database.

Thanks,

Greg

----------------------------------------------------------------------------


ripley at stats.ox.ac.uk wrote:
> 
> Sorry, but you mis-read the help page for `attach', and your explanation
> is poppycock.
> 
> > d <- data.frame(y=10)
> > attach(d)
> > d$y <- 20
> > y
> [1] 10
> > find("y")
> [1] "d"
> 
> There is no `new variable': you are still seeing the one in the database
> which is attached. As the help page clearly says, that is not changed but
> a copy in the global environment is.
> 
> You can change the attached copy by direct use of assign():
> > assign("y", 30, pos=2)
> > y
> [1] 30
> 
> but that does not change d.  You can also detach and attach.
> 
> If you find the R documentation terse (it can be) do cross-check the S
> documentation (and this point is in both Venables & Ripley books, too).
> 
> On Sat, 17 Aug 2002, Greg Hammett wrote:
> 
> > I've just discovered R and think it is terrific.  I quickly reproduced
> > results with a few lines of R commands that 7 years ago I had to do with
> > a larger fortran code and many calls to NAG routines.  (I'm mostly a
> > computational plasma physicist, but occasionally delve into statistical
> > analysis of data.)
> >
> > But I've come accross a very peculiar behavior of attached objects that
> > cost me hours of searching for a bug, and it would be nice if the R
> > developers could implement a small change to make the language easier to
> > use.
> >
> > The problem was originally buried in a much larger code, but I've boiled
> > it down to a 6 line example:
> >
> > -----------
> >
> > > d <- data.frame(y=10)
> > > attach(d)
> > > d$y <- 20
> >
> > -----------
> >
> > The online help for attach() warns not to assign to the short variable
> > name "y", as that creates a new variable named "y" and the original
> > variable "d$y" remains unchanged.  So I assumed that I could assign to
> > the fully qualified name "d$y", and indeed that successfully changed the
> > value of d$y:
> >
> > -----------
> >
> > > d$y
> > [1] 20
> > > y
> > [1] 10
> > > ls()
> > [1] "d"
> >
> > ------------
> >
> > However, unbeknownst to me at first, it also created a new variable "y"
> > that keeps the original value of "d$y" and no longer points to the
> > present value of "d$y$".  Furthermore, this new variable "y" doesn't
> > show up in the list of objects reported by ls()!  (This is unlike the
> > example given in help(attach), where the new variable "height" created
> > by the assignment shows up in the ls() object list.)  If a user assumes
> > that "y" points to the present value of "d$y$, as the attach() command
> > usually does, he will have bugs that will be very hard to track down.
> >
> > Although the new variable "y" is hidden from the ls() list of objects,
> > it will be removed by doing a detach("d") command:
> >
> > -----------
> >
> > > detach("d")
> > > y
> > Error: Object "y" not found
> >
> > -----------
> >
> > I can't think of any good reason why R should behave like this.  I've
> > tried this same example in Splus, and was surprised to see that it has
> > the same behavior, so I suppose R at least has compatible
> > peculiarities.  I understand that assigning to a short variable name
> > when attach is operational is supposed to create a new variable instead
> > of modifying the original:
> >
> > > d <- data.frame(y=10)
> > > attach(d)
> > > y <- 20
> > > d$y
> > [1] 10
> >
> > and that a lot of R code might have been written assuming this behavior
> > so it probably shouldn't be changed at this point.  But if one makes an
> > assignment to a fully qualified long variable name, I can't think of any
> > good reason for a new semi-hidden variable to be created.  Thus I think
> > that R should instead do the following:
> >
> > > d <- data.frame(y=10)
> > > attach(d)
> > > d$y <- 20
> > > y
> > [1] 20
> >
> > This seems to me to be a much more natural and intuitive behavior that
> > the user should expect.  Compatibility issues may require adding a
> > switch to allow users to get the old behavior if they really wanted, but
> > I can't think of how any users could have relied on this undocumented
> > "feature"...
> >
> > --------------------------------------------------------------
> >
> > I'm new to R, so perhaps I'm missing something that could be explained
> > to me.  If it is decided not to change R's behavior, then at the least
> > I suggest that the example given by help(attach) be extended by
> > appending the following:
> >
> >      attach(women)
> >      women$height <- height*2.54  ## Don't try to do this either, as it
> >      ## will still create a new variable "height" with the original
> >      ## values of women$height.  I.e., height no longer points to the
> >      ## present value of women$height:
> >
> >      sd(women$height-height)   # shows 6.88709
> >
> >      ## furthermore, this new variable is not listed by ls() and
> >      ## disappears after doing detach("women")
> >      ls()
> >      detach("women")
> >      height   # gives an error message
> >
> >
> > ------------
> > Greg Hammett    hammett at princeton.edu
> > Lecturer with rank of Professor,
> >    Astrophysical Sciences, Princeton University
> > Principal Research Physicist,
> >    Princeton Plasma Physics Laboratory
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >
> 
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list