[R] Write a function that allows access to columns of a passeddataframe.

David Winsemius dwinsemius at comcast.net
Tue Dec 6 19:41:52 CET 2016


> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> 
> Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows.
> 
> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
> mydf
> class(mydf)
> str(mydf)
> 
> myfun <- function(frame,var){
> 	yy <- as.character(substitute(var))
> 	frame[, yy]
> }
> 
> myfun(mydf, age)
> myfun(mydf, "age")
> 
> Rui Barradas
> 
> Em 06-12-2016 15:03, William Dunlap escreveu:
>> I basically agree with Rui - using substitute will cause trouble.  E.g., how
>> would the user iterate over the columns, calling your function for each?
>>      for(column in dataFrame) func(column)
>> would fail because dataFrame$column does not exist.  You need to provide
>> an extra argument to handle this case. something like the following:
>>      func <- function(df,
>>          columnAsName,,
>>          columnAsString = deparse(substitute(columnAsName))[1])
>>          ...
>>      }
>> The default value of columnAsString should also deal with the case that
>> the user supplied something like log(Conc.) instead of Conc.
>> 
>> I think that using a formula for the lazily evaluated argument
>> (columnAsName)
>> works well.  The user then knows exactly how it gets evaluated.

This would be an implementation that would support a multi-column extraction using a formula object:

mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)

myfun <- function(frame, vars){
	yy <- terms(vars)
	frame[, attr(yy, "term.labels")]
}

myfun(mydf, ~age+sex)


>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com <http://tibco.com>
>> 
>> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
>> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>> 
>>    Over my almost 50 years programming, I have come to believe that if
>>    one wants a program to be useful, one should write the program to do
>>    as much work as possible and demand as little as possible from the
>>    user of the program. In my opinion, one should not ask the person
>>    who uses my function to remember to put the name of the data frame
>>    column in quotation marks. The function should be written so that
>>    all that needs to be passed is the name of the column; the function
>>    should take care of the quotation marks.
>>    Jihny
>> 
>>    > John David Sorkin M.D., Ph.D.
>>    > Professor of Medicine
>>    > Chief, Biostatistics and Informatics
>>    > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>>    > Baltimore VA Medical Center
>>    > 10 North Greene Street
>>    > GRECC (BT/18/GR)
>>    > Baltimore, MD 21201-1524
>>    > (Phone)410-605-7119 <tel:410-605-7119>
>>    > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above
>>    prior to faxing)
>> 
>> 
>>     > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>>    <mailto:ruipbarradas at sapo.pt>> wrote:
>>     >
>>     > Hello,
>>     >
>>     > Just to say that I wouldn't write the function as John did. I
>>    would get
>>     > rid of all the deparse/substitute stuff and instinctively use a
>>    quoted
>>     > argument as a column name. Something like the following.
>>     >
>>     > myfun <- function(frame, var){
>>     >    [...]
>>     >    col <- frame[, var]  # or frame[[var]]
>>     >    [...]
>>     > }
>>     >
>>     > myfun(mydf, "age")  # much better, simpler, no promises.
>>     >
>>     > Rui Barradas
>>     >
>>     > Em 05-12-2016 21:49, Bert Gunter escreveu:
>>     >> Typo: "lazy evaluation" not "lay evaluation."
>>     >>
>>     >> -- Bert
>>     >>
>>     >>
>>     >>
>>     >> Bert Gunter
>>     >>
>>     >> "The trouble with having an open mind is that people keep coming
>>    along
>>     >> and sticking things into it."
>>     >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>     >>
>>     >>
>>     >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>>    <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>     >>> Sorry, hit "Send" by mistake.
>>     >>>
>>     >>> Inline.
>>     >>>
>>     >>>
>>     >>>
>>     >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>>    <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>     >>>> Inline.
>>     >>>>
>>     >>>> -- Bert
>>     >>>>
>>     >>>>
>>     >>>> Bert Gunter
>>     >>>>
>>     >>>> "The trouble with having an open mind is that people keep
>>    coming along
>>     >>>> and sticking things into it."
>>     >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>     >>>>
>>     >>>>
>>     >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>>    <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>>     >>>>> Hello,
>>     >>>>>
>>     >>>>> Inline.
>>     >>>>>
>>     >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>>     >>>>>>
>>     >>>>>>
>>     >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>>    <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>>     >>>>>>> wrote:
>>     >>>>>>>
>>     >>>>>>> Rui,
>>     >>>>>>> I appreciate your suggestion, but eliminating the deparse
>>    statement does
>>     >>>>>>> not solve my problem. Do you have any other suggestions?
>>    See code below.
>>     >>>>>>> Thank you,
>>     >>>>>>> John
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>> mydf <-
>>     >>>>>>>
>>    data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>>     >>>>>>> mydf
>>     >>>>>>> class(mydf)
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>> myfun <- function(frame,var){
>>     >>>>>>>   call <- match.call()
>>     >>>>>>>   print(call)
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   indx <- match(c("frame","var"),names(call),nomatch=0)
>>     >>>>>>>   print(indx)
>>     >>>>>>>   if(indx[1]==0) stop("Function called without sufficient
>>    arguments!")
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   cat("I can get the name of the dataframe as a text
>>    string!\n")
>>     >>>>>>>   #xx <- deparse(substitute(frame))
>>     >>>>>>>   print(xx)
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   cat("I can get the name of the column as a text string!\n")
>>     >>>>>>>   #yy <- deparse(substitute(var))
>>     >>>>>>>   print(yy)
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   # This does not work.
>>     >>>>>>>   print(frame[,var])
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   # This does not work.
>>     >>>>>>>   print(frame[,"var"])
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   # This does not work.
>>     >>>>>>>   col <- xx[,"yy"]
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>>   # Nor does this work.
>>     >>>>>>>   col <- xx[,yy]
>>     >>>>>>>   print(col)
>>     >>>>>>> }
>>     >>>>>>>
>>     >>>>>>>
>>     >>>>>>> myfun(mydf,age)
>>     >>>>>>
>>     >>>>>>
>>     >>>>>>
>>     >>>>>> When you use that calling syntax, the system will supply the
>>    values of
>>     >>>>>> whatever the `age` variable contains. (And if there is no
>>    `age`-named
>>     >>>>>> object, you get an error at the time of the call to `myfun`.
>>     >>>>>
>>     >>>>>
>>     >>>>> Actually, no, which was very surprising to me but John's code
>>    worked (not
>>     >>>>> the function, the call). And with the change I've proposed,
>>    it worked
>>     >>>>> flawlessly. No errors. Why I don't know.
>>     >>>
>>     >>> See ?substitute and in particular the example highlighted there.
>>     >>>
>>     >>> The technical details are explained in the R Language Definition
>>     >>> manual. The key here is the use of promises for lay evaluations. In
>>     >>> fact, the expression in the call *is* available within the
>>    functions,
>>     >>> as is (a pointer to) the environment in which to evaluate the
>>     >>> expression. That is how substitute() works. Specifically,
>>    quoting from
>>     >>> the manual,
>>     >>>
>>     >>> *****
>>     >>> It is possible to access the actual (not default) expressions
>>    used as
>>     >>> arguments inside the function. The mechanism is implemented via
>>     >>> promises. When a function is being evaluated the actual expression
>>     >>> used as an argument is stored in the promise together with a
>>    pointer
>>     >>> to the environment the function was called from. When (if) the
>>     >>> argument is evaluated the stored expression is evaluated in the
>>     >>> environment that the function was called from. Since only a
>>    pointer to
>>     >>> the environment is used any changes made to that environment
>>    will be
>>     >>> in effect during this evaluation. The resulting value is then also
>>     >>> stored in a separate spot in the promise. Subsequent evaluations
>>     >>> retrieve this stored value (a second evaluation is not carried
>>    out).
>>     >>> Access to the unevaluated expression is also available using
>>     >>> substitute.
>>     >>> ********
>>     >>>
>>     >>> -- Bert
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >>>>>
>>     >>>>> Rui Barradas
>>     >>>>>
>>     >>>>>  You need either to call it as:
>>     >>>>>>
>>     >>>>>>
>>     >>>>>> myfun( mydf , "age")
>>     >>>>>>
>>     >>>>>>
>>     >>>>>> # Or:
>>     >>>>>>
>>     >>>>>> age <- "age"
>>     >>>>>> myfun( mydf, age)
>>     >>>>>>
>>     >>>>>> Unless your value of the `age`-named variable was "age" in
>>    the calling
>>     >>>>>> environment (and you did not give us that value in either of
>>    your postings),
>>     >>>>>> you would fail.
>>     >>>>>>
>>     >>>>>
>>     >>>>> ______________________________________________
>>     >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>>    list -- To UNSUBSCRIBE and more, see
>>     >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>     >>>>> PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    <http://www.R-project.org/posting-guide.html>
>>     >>>>> and provide commented, minimal, self-contained, reproducible
>>    code.
>> 
>>    Confidentiality Statement:
>>    This email message, including any attachments, is for the sole use
>>    of the intended recipient(s) and may contain confidential and
>>    privileged information. Any unauthorized use, disclosure or
>>    distribution is prohibited. If you are not the intended recipient,
>>    please contact the sender by reply email and destroy all copies of
>>    the original message.
>>    ______________________________________________
>>    R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>>    To UNSUBSCRIBE and more, see
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>    PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    <http://www.R-project.org/posting-guide.html>
>>    and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list