[R] Write a function that allows access to columns of a passeddataframe.

William Dunlap wdunlap at tibco.com
Tue Dec 6 16:57:46 CET 2016


Note that library has another argument, character.only=TRUE/FALSE,
to control whether the main argument should be regarded as a variable
or a literal.  I think you need two arguments to handle this.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 6, 2016 at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

> Perhaps the best way is the one used by library(), where both
> library(package) and library("package") work. It uses
> as.charecter/substitute, not deparse/substitute, as follows.
>
> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(
> 20,34,43,32,21))
> mydf
> class(mydf)
> str(mydf)
>
> myfun <- function(frame,var){
>         yy <- as.character(substitute(var))
>         frame[, yy]
> }
>
> myfun(mydf, age)
> myfun(mydf, "age")
>
> Rui Barradas
>
> Em 06-12-2016 15:03, William Dunlap escreveu:
>
>> I basically agree with Rui - using substitute will cause trouble.  E.g.,
>> how
>> would the user iterate over the columns, calling your function for each?
>>       for(column in dataFrame) func(column)
>> would fail because dataFrame$column does not exist.  You need to provide
>> an extra argument to handle this case. something like the following:
>>       func <- function(df,
>>           columnAsName,,
>>           columnAsString = deparse(substitute(columnAsName))[1])
>>           ...
>>       }
>> The default value of columnAsString should also deal with the case that
>> the user supplied something like log(Conc.) instead of Conc.
>>
>> I think that using a formula for the lazily evaluated argument
>> (columnAsName)
>> works well.  The user then knows exactly how it gets evaluated.
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com <http://tibco.com>
>>
>> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
>> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>>
>>     Over my almost 50 years programming, I have come to believe that if
>>     one wants a program to be useful, one should write the program to do
>>     as much work as possible and demand as little as possible from the
>>     user of the program. In my opinion, one should not ask the person
>>     who uses my function to remember to put the name of the data frame
>>     column in quotation marks. The function should be written so that
>>     all that needs to be passed is the name of the column; the function
>>     should take care of the quotation marks.
>>     Jihny
>>
>>     > John David Sorkin M.D., Ph.D.
>>     > Professor of Medicine
>>     > Chief, Biostatistics and Informatics
>>     > University of Maryland School of Medicine Division of Gerontology
>> and Geriatric Medicine
>>     > Baltimore VA Medical Center
>>     > 10 North Greene Street
>>     > GRECC (BT/18/GR)
>>     > Baltimore, MD 21201-1524
>>     > (Phone)410-605-7119 <tel:410-605-7119>
>>     > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number
>> above
>>     prior to faxing)
>>
>>
>>      > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>>     <mailto:ruipbarradas at sapo.pt>> wrote:
>>      >
>>      > Hello,
>>      >
>>      > Just to say that I wouldn't write the function as John did. I
>>     would get
>>      > rid of all the deparse/substitute stuff and instinctively use a
>>     quoted
>>      > argument as a column name. Something like the following.
>>      >
>>      > myfun <- function(frame, var){
>>      >    [...]
>>      >    col <- frame[, var]  # or frame[[var]]
>>      >    [...]
>>      > }
>>      >
>>      > myfun(mydf, "age")  # much better, simpler, no promises.
>>      >
>>      > Rui Barradas
>>      >
>>      > Em 05-12-2016 21:49, Bert Gunter escreveu:
>>      >> Typo: "lazy evaluation" not "lay evaluation."
>>      >>
>>      >> -- Bert
>>      >>
>>      >>
>>      >>
>>      >> Bert Gunter
>>      >>
>>      >> "The trouble with having an open mind is that people keep coming
>>     along
>>      >> and sticking things into it."
>>      >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>      >>
>>      >>
>>      >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>>     <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>      >>> Sorry, hit "Send" by mistake.
>>      >>>
>>      >>> Inline.
>>      >>>
>>      >>>
>>      >>>
>>      >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>>     <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>      >>>> Inline.
>>      >>>>
>>      >>>> -- Bert
>>      >>>>
>>      >>>>
>>      >>>> Bert Gunter
>>      >>>>
>>      >>>> "The trouble with having an open mind is that people keep
>>     coming along
>>      >>>> and sticking things into it."
>>      >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
>> strip )
>>      >>>>
>>      >>>>
>>      >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>>     <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>>      >>>>> Hello,
>>      >>>>>
>>      >>>>> Inline.
>>      >>>>>
>>      >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>>      >>>>>>
>>      >>>>>>
>>      >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>>     <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>>
>>      >>>>>>> wrote:
>>      >>>>>>>
>>      >>>>>>> Rui,
>>      >>>>>>> I appreciate your suggestion, but eliminating the deparse
>>     statement does
>>      >>>>>>> not solve my problem. Do you have any other suggestions?
>>     See code below.
>>      >>>>>>> Thank you,
>>      >>>>>>> John
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>> mydf <-
>>      >>>>>>>
>>     data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(
>> 20,34,43,32,21))
>>      >>>>>>> mydf
>>      >>>>>>> class(mydf)
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>> myfun <- function(frame,var){
>>      >>>>>>>   call <- match.call()
>>      >>>>>>>   print(call)
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   indx <- match(c("frame","var"),names(call),nomatch=0)
>>      >>>>>>>   print(indx)
>>      >>>>>>>   if(indx[1]==0) stop("Function called without sufficient
>>     arguments!")
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   cat("I can get the name of the dataframe as a text
>>     string!\n")
>>      >>>>>>>   #xx <- deparse(substitute(frame))
>>      >>>>>>>   print(xx)
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   cat("I can get the name of the column as a text string!\n")
>>      >>>>>>>   #yy <- deparse(substitute(var))
>>      >>>>>>>   print(yy)
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   # This does not work.
>>      >>>>>>>   print(frame[,var])
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   # This does not work.
>>      >>>>>>>   print(frame[,"var"])
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   # This does not work.
>>      >>>>>>>   col <- xx[,"yy"]
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>>   # Nor does this work.
>>      >>>>>>>   col <- xx[,yy]
>>      >>>>>>>   print(col)
>>      >>>>>>> }
>>      >>>>>>>
>>      >>>>>>>
>>      >>>>>>> myfun(mydf,age)
>>      >>>>>>
>>      >>>>>>
>>      >>>>>>
>>      >>>>>> When you use that calling syntax, the system will supply the
>>     values of
>>      >>>>>> whatever the `age` variable contains. (And if there is no
>>     `age`-named
>>      >>>>>> object, you get an error at the time of the call to `myfun`.
>>      >>>>>
>>      >>>>>
>>      >>>>> Actually, no, which was very surprising to me but John's code
>>     worked (not
>>      >>>>> the function, the call). And with the change I've proposed,
>>     it worked
>>      >>>>> flawlessly. No errors. Why I don't know.
>>      >>>
>>      >>> See ?substitute and in particular the example highlighted there.
>>      >>>
>>      >>> The technical details are explained in the R Language Definition
>>      >>> manual. The key here is the use of promises for lay evaluations.
>> In
>>      >>> fact, the expression in the call *is* available within the
>>     functions,
>>      >>> as is (a pointer to) the environment in which to evaluate the
>>      >>> expression. That is how substitute() works. Specifically,
>>     quoting from
>>      >>> the manual,
>>      >>>
>>      >>> *****
>>      >>> It is possible to access the actual (not default) expressions
>>     used as
>>      >>> arguments inside the function. The mechanism is implemented via
>>      >>> promises. When a function is being evaluated the actual
>> expression
>>      >>> used as an argument is stored in the promise together with a
>>     pointer
>>      >>> to the environment the function was called from. When (if) the
>>      >>> argument is evaluated the stored expression is evaluated in the
>>      >>> environment that the function was called from. Since only a
>>     pointer to
>>      >>> the environment is used any changes made to that environment
>>     will be
>>      >>> in effect during this evaluation. The resulting value is then
>> also
>>      >>> stored in a separate spot in the promise. Subsequent evaluations
>>      >>> retrieve this stored value (a second evaluation is not carried
>>     out).
>>      >>> Access to the unevaluated expression is also available using
>>      >>> substitute.
>>      >>> ********
>>      >>>
>>      >>> -- Bert
>>      >>>
>>      >>>
>>      >>>
>>      >>>
>>      >>>>>
>>      >>>>> Rui Barradas
>>      >>>>>
>>      >>>>>  You need either to call it as:
>>      >>>>>>
>>      >>>>>>
>>      >>>>>> myfun( mydf , "age")
>>      >>>>>>
>>      >>>>>>
>>      >>>>>> # Or:
>>      >>>>>>
>>      >>>>>> age <- "age"
>>      >>>>>> myfun( mydf, age)
>>      >>>>>>
>>      >>>>>> Unless your value of the `age`-named variable was "age" in
>>     the calling
>>      >>>>>> environment (and you did not give us that value in either of
>>     your postings),
>>      >>>>>> you would fail.
>>      >>>>>>
>>      >>>>>
>>      >>>>> ______________________________________________
>>      >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>>     list -- To UNSUBSCRIBE and more, see
>>      >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>>      >>>>> PLEASE do read the posting guide
>>     http://www.R-project.org/posting-guide.html
>>     <http://www.R-project.org/posting-guide.html>
>>      >>>>> and provide commented, minimal, self-contained, reproducible
>>     code.
>>
>>     Confidentiality Statement:
>>     This email message, including any attachments, is for the sole use
>>     of the intended recipient(s) and may contain confidential and
>>     privileged information. Any unauthorized use, disclosure or
>>     distribution is prohibited. If you are not the intended recipient,
>>     please contact the sender by reply email and destroy all copies of
>>     the original message.
>>     ______________________________________________
>>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>>     To UNSUBSCRIBE and more, see
>>     https://stat.ethz.ch/mailman/listinfo/r-help
>>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>>     PLEASE do read the posting guide
>>     http://www.R-project.org/posting-guide.html
>>     <http://www.R-project.org/posting-guide.html>
>>     and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list