[R] Write a function that allows access to columns of a passeddataframe.

Rui Barradas ruipbarradas at sapo.pt
Tue Dec 6 16:33:40 CET 2016


Perhaps the best way is the one used by library(), where both 
library(package) and library("package") work. It uses 
as.charecter/substitute, not deparse/substitute, as follows.

mydf <- 
data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)

myfun <- function(frame,var){
	yy <- as.character(substitute(var))
	frame[, yy]
}

myfun(mydf, age)
myfun(mydf, "age")

Rui Barradas

Em 06-12-2016 15:03, William Dunlap escreveu:
> I basically agree with Rui - using substitute will cause trouble.  E.g., how
> would the user iterate over the columns, calling your function for each?
>       for(column in dataFrame) func(column)
> would fail because dataFrame$column does not exist.  You need to provide
> an extra argument to handle this case. something like the following:
>       func <- function(df,
>           columnAsName,,
>           columnAsString = deparse(substitute(columnAsName))[1])
>           ...
>       }
> The default value of columnAsString should also deal with the case that
> the user supplied something like log(Conc.) instead of Conc.
>
> I think that using a formula for the lazily evaluated argument
> (columnAsName)
> works well.  The user then knows exactly how it gets evaluated.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>
>     Over my almost 50 years programming, I have come to believe that if
>     one wants a program to be useful, one should write the program to do
>     as much work as possible and demand as little as possible from the
>     user of the program. In my opinion, one should not ask the person
>     who uses my function to remember to put the name of the data frame
>     column in quotation marks. The function should be written so that
>     all that needs to be passed is the name of the column; the function
>     should take care of the quotation marks.
>     Jihny
>
>     > John David Sorkin M.D., Ph.D.
>     > Professor of Medicine
>     > Chief, Biostatistics and Informatics
>     > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>     > Baltimore VA Medical Center
>     > 10 North Greene Street
>     > GRECC (BT/18/GR)
>     > Baltimore, MD 21201-1524
>     > (Phone)410-605-7119 <tel:410-605-7119>
>     > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above
>     prior to faxing)
>
>
>      > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>     <mailto:ruipbarradas at sapo.pt>> wrote:
>      >
>      > Hello,
>      >
>      > Just to say that I wouldn't write the function as John did. I
>     would get
>      > rid of all the deparse/substitute stuff and instinctively use a
>     quoted
>      > argument as a column name. Something like the following.
>      >
>      > myfun <- function(frame, var){
>      >    [...]
>      >    col <- frame[, var]  # or frame[[var]]
>      >    [...]
>      > }
>      >
>      > myfun(mydf, "age")  # much better, simpler, no promises.
>      >
>      > Rui Barradas
>      >
>      > Em 05-12-2016 21:49, Bert Gunter escreveu:
>      >> Typo: "lazy evaluation" not "lay evaluation."
>      >>
>      >> -- Bert
>      >>
>      >>
>      >>
>      >> Bert Gunter
>      >>
>      >> "The trouble with having an open mind is that people keep coming
>     along
>      >> and sticking things into it."
>      >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>      >>
>      >>
>      >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>     <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>      >>> Sorry, hit "Send" by mistake.
>      >>>
>      >>> Inline.
>      >>>
>      >>>
>      >>>
>      >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>     <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>      >>>> Inline.
>      >>>>
>      >>>> -- Bert
>      >>>>
>      >>>>
>      >>>> Bert Gunter
>      >>>>
>      >>>> "The trouble with having an open mind is that people keep
>     coming along
>      >>>> and sticking things into it."
>      >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>      >>>>
>      >>>>
>      >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>     <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>      >>>>> Hello,
>      >>>>>
>      >>>>> Inline.
>      >>>>>
>      >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>      >>>>>>
>      >>>>>>
>      >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>     <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>      >>>>>>> wrote:
>      >>>>>>>
>      >>>>>>> Rui,
>      >>>>>>> I appreciate your suggestion, but eliminating the deparse
>     statement does
>      >>>>>>> not solve my problem. Do you have any other suggestions?
>     See code below.
>      >>>>>>> Thank you,
>      >>>>>>> John
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> mydf <-
>      >>>>>>>
>     data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>      >>>>>>> mydf
>      >>>>>>> class(mydf)
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> myfun <- function(frame,var){
>      >>>>>>>   call <- match.call()
>      >>>>>>>   print(call)
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   indx <- match(c("frame","var"),names(call),nomatch=0)
>      >>>>>>>   print(indx)
>      >>>>>>>   if(indx[1]==0) stop("Function called without sufficient
>     arguments!")
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   cat("I can get the name of the dataframe as a text
>     string!\n")
>      >>>>>>>   #xx <- deparse(substitute(frame))
>      >>>>>>>   print(xx)
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   cat("I can get the name of the column as a text string!\n")
>      >>>>>>>   #yy <- deparse(substitute(var))
>      >>>>>>>   print(yy)
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   # This does not work.
>      >>>>>>>   print(frame[,var])
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   # This does not work.
>      >>>>>>>   print(frame[,"var"])
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   # This does not work.
>      >>>>>>>   col <- xx[,"yy"]
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>   # Nor does this work.
>      >>>>>>>   col <- xx[,yy]
>      >>>>>>>   print(col)
>      >>>>>>> }
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> myfun(mydf,age)
>      >>>>>>
>      >>>>>>
>      >>>>>>
>      >>>>>> When you use that calling syntax, the system will supply the
>     values of
>      >>>>>> whatever the `age` variable contains. (And if there is no
>     `age`-named
>      >>>>>> object, you get an error at the time of the call to `myfun`.
>      >>>>>
>      >>>>>
>      >>>>> Actually, no, which was very surprising to me but John's code
>     worked (not
>      >>>>> the function, the call). And with the change I've proposed,
>     it worked
>      >>>>> flawlessly. No errors. Why I don't know.
>      >>>
>      >>> See ?substitute and in particular the example highlighted there.
>      >>>
>      >>> The technical details are explained in the R Language Definition
>      >>> manual. The key here is the use of promises for lay evaluations. In
>      >>> fact, the expression in the call *is* available within the
>     functions,
>      >>> as is (a pointer to) the environment in which to evaluate the
>      >>> expression. That is how substitute() works. Specifically,
>     quoting from
>      >>> the manual,
>      >>>
>      >>> *****
>      >>> It is possible to access the actual (not default) expressions
>     used as
>      >>> arguments inside the function. The mechanism is implemented via
>      >>> promises. When a function is being evaluated the actual expression
>      >>> used as an argument is stored in the promise together with a
>     pointer
>      >>> to the environment the function was called from. When (if) the
>      >>> argument is evaluated the stored expression is evaluated in the
>      >>> environment that the function was called from. Since only a
>     pointer to
>      >>> the environment is used any changes made to that environment
>     will be
>      >>> in effect during this evaluation. The resulting value is then also
>      >>> stored in a separate spot in the promise. Subsequent evaluations
>      >>> retrieve this stored value (a second evaluation is not carried
>     out).
>      >>> Access to the unevaluated expression is also available using
>      >>> substitute.
>      >>> ********
>      >>>
>      >>> -- Bert
>      >>>
>      >>>
>      >>>
>      >>>
>      >>>>>
>      >>>>> Rui Barradas
>      >>>>>
>      >>>>>  You need either to call it as:
>      >>>>>>
>      >>>>>>
>      >>>>>> myfun( mydf , "age")
>      >>>>>>
>      >>>>>>
>      >>>>>> # Or:
>      >>>>>>
>      >>>>>> age <- "age"
>      >>>>>> myfun( mydf, age)
>      >>>>>>
>      >>>>>> Unless your value of the `age`-named variable was "age" in
>     the calling
>      >>>>>> environment (and you did not give us that value in either of
>     your postings),
>      >>>>>> you would fail.
>      >>>>>>
>      >>>>>
>      >>>>> ______________________________________________
>      >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>     list -- To UNSUBSCRIBE and more, see
>      >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      >>>>> PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>      >>>>> and provide commented, minimal, self-contained, reproducible
>     code.
>
>     Confidentiality Statement:
>     This email message, including any attachments, is for the sole use
>     of the intended recipient(s) and may contain confidential and
>     privileged information. Any unauthorized use, disclosure or
>     distribution is prohibited. If you are not the intended recipient,
>     please contact the sender by reply email and destroy all copies of
>     the original message.
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list