[R] Write a function that allows access to columns of a passeddataframe.

William Dunlap wdunlap at tibco.com
Tue Dec 6 16:03:07 CET 2016


I basically agree with Rui - using substitute will cause trouble.  E.g., how
would the user iterate over the columns, calling your function for each?
     for(column in dataFrame) func(column)
would fail because dataFrame$column does not exist.  You need to provide
an extra argument to handle this case. something like the following:
     func <- function(df,
         columnAsName,,
         columnAsString = deparse(substitute(columnAsName))[1])
         ...
     }
The default value of columnAsString should also deal with the case that
the user supplied something like log(Conc.) instead of Conc.

I think that using a formula for the lazily evaluated argument
(columnAsName)
works well.  The user then knows exactly how it gets evaluated.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu>
wrote:

> Over my almost 50 years programming, I have come to believe that if one
> wants a program to be useful, one should write the program to do as much
> work as possible and demand as little as possible from the user of the
> program. In my opinion, one should not ask the person who uses my function
> to remember to put the name of the data frame column in quotation marks.
> The function should be written so that all that needs to be passed is the
> name of the column; the function should take care of the quotation marks.
> Jihny
>
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> >
> > Hello,
> >
> > Just to say that I wouldn't write the function as John did. I would get
> > rid of all the deparse/substitute stuff and instinctively use a quoted
> > argument as a column name. Something like the following.
> >
> > myfun <- function(frame, var){
> >    [...]
> >    col <- frame[, var]  # or frame[[var]]
> >    [...]
> > }
> >
> > myfun(mydf, "age")  # much better, simpler, no promises.
> >
> > Rui Barradas
> >
> > Em 05-12-2016 21:49, Bert Gunter escreveu:
> >> Typo: "lazy evaluation" not "lay evaluation."
> >>
> >> -- Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
> >>> Sorry, hit "Send" by mistake.
> >>>
> >>> Inline.
> >>>
> >>>
> >>>
> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
> >>>> Inline.
> >>>>
> >>>> -- Bert
> >>>>
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that people keep coming along
> >>>> and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>
> >>>>
> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas <ruipbarradas at sapo.pt>
> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> Inline.
> >>>>>
> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
> >>>>>>
> >>>>>>
> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin <
> jsorkin at grecc.umaryland.edu>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Rui,
> >>>>>>> I appreciate your suggestion, but eliminating the deparse
> statement does
> >>>>>>> not solve my problem. Do you have any other suggestions? See code
> below.
> >>>>>>> Thank you,
> >>>>>>> John
> >>>>>>>
> >>>>>>>
> >>>>>>> mydf <-
> >>>>>>> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),
> age=c(20,34,43,32,21))
> >>>>>>> mydf
> >>>>>>> class(mydf)
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun <- function(frame,var){
> >>>>>>>   call <- match.call()
> >>>>>>>   print(call)
> >>>>>>>
> >>>>>>>
> >>>>>>>   indx <- match(c("frame","var"),names(call),nomatch=0)
> >>>>>>>   print(indx)
> >>>>>>>   if(indx[1]==0) stop("Function called without sufficient
> arguments!")
> >>>>>>>
> >>>>>>>
> >>>>>>>   cat("I can get the name of the dataframe as a text string!\n")
> >>>>>>>   #xx <- deparse(substitute(frame))
> >>>>>>>   print(xx)
> >>>>>>>
> >>>>>>>
> >>>>>>>   cat("I can get the name of the column as a text string!\n")
> >>>>>>>   #yy <- deparse(substitute(var))
> >>>>>>>   print(yy)
> >>>>>>>
> >>>>>>>
> >>>>>>>   # This does not work.
> >>>>>>>   print(frame[,var])
> >>>>>>>
> >>>>>>>
> >>>>>>>   # This does not work.
> >>>>>>>   print(frame[,"var"])
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>   # This does not work.
> >>>>>>>   col <- xx[,"yy"]
> >>>>>>>
> >>>>>>>
> >>>>>>>   # Nor does this work.
> >>>>>>>   col <- xx[,yy]
> >>>>>>>   print(col)
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun(mydf,age)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> When you use that calling syntax, the system will supply the values
> of
> >>>>>> whatever the `age` variable contains. (And if there is no
> `age`-named
> >>>>>> object, you get an error at the time of the call to `myfun`.
> >>>>>
> >>>>>
> >>>>> Actually, no, which was very surprising to me but John's code worked
> (not
> >>>>> the function, the call). And with the change I've proposed, it worked
> >>>>> flawlessly. No errors. Why I don't know.
> >>>
> >>> See ?substitute and in particular the example highlighted there.
> >>>
> >>> The technical details are explained in the R Language Definition
> >>> manual. The key here is the use of promises for lay evaluations. In
> >>> fact, the expression in the call *is* available within the functions,
> >>> as is (a pointer to) the environment in which to evaluate the
> >>> expression. That is how substitute() works. Specifically, quoting from
> >>> the manual,
> >>>
> >>> *****
> >>> It is possible to access the actual (not default) expressions used as
> >>> arguments inside the function. The mechanism is implemented via
> >>> promises. When a function is being evaluated the actual expression
> >>> used as an argument is stored in the promise together with a pointer
> >>> to the environment the function was called from. When (if) the
> >>> argument is evaluated the stored expression is evaluated in the
> >>> environment that the function was called from. Since only a pointer to
> >>> the environment is used any changes made to that environment will be
> >>> in effect during this evaluation. The resulting value is then also
> >>> stored in a separate spot in the promise. Subsequent evaluations
> >>> retrieve this stored value (a second evaluation is not carried out).
> >>> Access to the unevaluated expression is also available using
> >>> substitute.
> >>> ********
> >>>
> >>> -- Bert
> >>>
> >>>
> >>>
> >>>
> >>>>>
> >>>>> Rui Barradas
> >>>>>
> >>>>>  You need either to call it as:
> >>>>>>
> >>>>>>
> >>>>>> myfun( mydf , "age")
> >>>>>>
> >>>>>>
> >>>>>> # Or:
> >>>>>>
> >>>>>> age <- "age"
> >>>>>> myfun( mydf, age)
> >>>>>>
> >>>>>> Unless your value of the `age`-named variable was "age" in the
> calling
> >>>>>> environment (and you did not give us that value in either of your
> postings),
> >>>>>> you would fail.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:16}}



More information about the R-help mailing list