[R] Write a function that allows access to columns of a passeddataframe.

Bert Gunter bgunter.4567 at gmail.com
Tue Dec 6 23:00:40 CET 2016


Simpler I think: ?all.vars

> all.vars(~A+B)
[1] "A" "B"

Note also:

> all.vars(~log(A))
[1] "A"

Cheers,
Bert





"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Dec 6, 2016 at 10:41 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>>
>> Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows.
>>
>> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>> mydf
>> class(mydf)
>> str(mydf)
>>
>> myfun <- function(frame,var){
>>       yy <- as.character(substitute(var))
>>       frame[, yy]
>> }
>>
>> myfun(mydf, age)
>> myfun(mydf, "age")
>>
>> Rui Barradas
>>
>> Em 06-12-2016 15:03, William Dunlap escreveu:
>>> I basically agree with Rui - using substitute will cause trouble.  E.g., how
>>> would the user iterate over the columns, calling your function for each?
>>>      for(column in dataFrame) func(column)
>>> would fail because dataFrame$column does not exist.  You need to provide
>>> an extra argument to handle this case. something like the following:
>>>      func <- function(df,
>>>          columnAsName,,
>>>          columnAsString = deparse(substitute(columnAsName))[1])
>>>          ...
>>>      }
>>> The default value of columnAsString should also deal with the case that
>>> the user supplied something like log(Conc.) instead of Conc.
>>>
>>> I think that using a formula for the lazily evaluated argument
>>> (columnAsName)
>>> works well.  The user then knows exactly how it gets evaluated.
>
> This would be an implementation that would support a multi-column extraction using a formula object:
>
> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
> mydf
> class(mydf)
> str(mydf)
>
> myfun <- function(frame, vars){
>         yy <- terms(vars)
>         frame[, attr(yy, "term.labels")]
> }
>
> myfun(mydf, ~age+sex)
>
>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com <http://tibco.com>
>>>
>>> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
>>> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>>>
>>>    Over my almost 50 years programming, I have come to believe that if
>>>    one wants a program to be useful, one should write the program to do
>>>    as much work as possible and demand as little as possible from the
>>>    user of the program. In my opinion, one should not ask the person
>>>    who uses my function to remember to put the name of the data frame
>>>    column in quotation marks. The function should be written so that
>>>    all that needs to be passed is the name of the column; the function
>>>    should take care of the quotation marks.
>>>    Jihny
>>>
>>>    > John David Sorkin M.D., Ph.D.
>>>    > Professor of Medicine
>>>    > Chief, Biostatistics and Informatics
>>>    > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>>>    > Baltimore VA Medical Center
>>>    > 10 North Greene Street
>>>    > GRECC (BT/18/GR)
>>>    > Baltimore, MD 21201-1524
>>>    > (Phone)410-605-7119 <tel:410-605-7119>
>>>    > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above
>>>    prior to faxing)
>>>
>>>
>>>     > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>>>    <mailto:ruipbarradas at sapo.pt>> wrote:
>>>     >
>>>     > Hello,
>>>     >
>>>     > Just to say that I wouldn't write the function as John did. I
>>>    would get
>>>     > rid of all the deparse/substitute stuff and instinctively use a
>>>    quoted
>>>     > argument as a column name. Something like the following.
>>>     >
>>>     > myfun <- function(frame, var){
>>>     >    [...]
>>>     >    col <- frame[, var]  # or frame[[var]]
>>>     >    [...]
>>>     > }
>>>     >
>>>     > myfun(mydf, "age")  # much better, simpler, no promises.
>>>     >
>>>     > Rui Barradas
>>>     >
>>>     > Em 05-12-2016 21:49, Bert Gunter escreveu:
>>>     >> Typo: "lazy evaluation" not "lay evaluation."
>>>     >>
>>>     >> -- Bert
>>>     >>
>>>     >>
>>>     >>
>>>     >> Bert Gunter
>>>     >>
>>>     >> "The trouble with having an open mind is that people keep coming
>>>    along
>>>     >> and sticking things into it."
>>>     >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>     >>
>>>     >>
>>>     >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>>>    <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>>     >>> Sorry, hit "Send" by mistake.
>>>     >>>
>>>     >>> Inline.
>>>     >>>
>>>     >>>
>>>     >>>
>>>     >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>>>    <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>>     >>>> Inline.
>>>     >>>>
>>>     >>>> -- Bert
>>>     >>>>
>>>     >>>>
>>>     >>>> Bert Gunter
>>>     >>>>
>>>     >>>> "The trouble with having an open mind is that people keep
>>>    coming along
>>>     >>>> and sticking things into it."
>>>     >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>     >>>>
>>>     >>>>
>>>     >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>>>    <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>>>     >>>>> Hello,
>>>     >>>>>
>>>     >>>>> Inline.
>>>     >>>>>
>>>     >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>>>     >>>>>>
>>>     >>>>>>
>>>     >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>>>    <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>>>     >>>>>>> wrote:
>>>     >>>>>>>
>>>     >>>>>>> Rui,
>>>     >>>>>>> I appreciate your suggestion, but eliminating the deparse
>>>    statement does
>>>     >>>>>>> not solve my problem. Do you have any other suggestions?
>>>    See code below.
>>>     >>>>>>> Thank you,
>>>     >>>>>>> John
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>> mydf <-
>>>     >>>>>>>
>>>    data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>>>     >>>>>>> mydf
>>>     >>>>>>> class(mydf)
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>> myfun <- function(frame,var){
>>>     >>>>>>>   call <- match.call()
>>>     >>>>>>>   print(call)
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   indx <- match(c("frame","var"),names(call),nomatch=0)
>>>     >>>>>>>   print(indx)
>>>     >>>>>>>   if(indx[1]==0) stop("Function called without sufficient
>>>    arguments!")
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   cat("I can get the name of the dataframe as a text
>>>    string!\n")
>>>     >>>>>>>   #xx <- deparse(substitute(frame))
>>>     >>>>>>>   print(xx)
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   cat("I can get the name of the column as a text string!\n")
>>>     >>>>>>>   #yy <- deparse(substitute(var))
>>>     >>>>>>>   print(yy)
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   # This does not work.
>>>     >>>>>>>   print(frame[,var])
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   # This does not work.
>>>     >>>>>>>   print(frame[,"var"])
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   # This does not work.
>>>     >>>>>>>   col <- xx[,"yy"]
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>>   # Nor does this work.
>>>     >>>>>>>   col <- xx[,yy]
>>>     >>>>>>>   print(col)
>>>     >>>>>>> }
>>>     >>>>>>>
>>>     >>>>>>>
>>>     >>>>>>> myfun(mydf,age)
>>>     >>>>>>
>>>     >>>>>>
>>>     >>>>>>
>>>     >>>>>> When you use that calling syntax, the system will supply the
>>>    values of
>>>     >>>>>> whatever the `age` variable contains. (And if there is no
>>>    `age`-named
>>>     >>>>>> object, you get an error at the time of the call to `myfun`.
>>>     >>>>>
>>>     >>>>>
>>>     >>>>> Actually, no, which was very surprising to me but John's code
>>>    worked (not
>>>     >>>>> the function, the call). And with the change I've proposed,
>>>    it worked
>>>     >>>>> flawlessly. No errors. Why I don't know.
>>>     >>>
>>>     >>> See ?substitute and in particular the example highlighted there.
>>>     >>>
>>>     >>> The technical details are explained in the R Language Definition
>>>     >>> manual. The key here is the use of promises for lay evaluations. In
>>>     >>> fact, the expression in the call *is* available within the
>>>    functions,
>>>     >>> as is (a pointer to) the environment in which to evaluate the
>>>     >>> expression. That is how substitute() works. Specifically,
>>>    quoting from
>>>     >>> the manual,
>>>     >>>
>>>     >>> *****
>>>     >>> It is possible to access the actual (not default) expressions
>>>    used as
>>>     >>> arguments inside the function. The mechanism is implemented via
>>>     >>> promises. When a function is being evaluated the actual expression
>>>     >>> used as an argument is stored in the promise together with a
>>>    pointer
>>>     >>> to the environment the function was called from. When (if) the
>>>     >>> argument is evaluated the stored expression is evaluated in the
>>>     >>> environment that the function was called from. Since only a
>>>    pointer to
>>>     >>> the environment is used any changes made to that environment
>>>    will be
>>>     >>> in effect during this evaluation. The resulting value is then also
>>>     >>> stored in a separate spot in the promise. Subsequent evaluations
>>>     >>> retrieve this stored value (a second evaluation is not carried
>>>    out).
>>>     >>> Access to the unevaluated expression is also available using
>>>     >>> substitute.
>>>     >>> ********
>>>     >>>
>>>     >>> -- Bert
>>>     >>>
>>>     >>>
>>>     >>>
>>>     >>>
>>>     >>>>>
>>>     >>>>> Rui Barradas
>>>     >>>>>
>>>     >>>>>  You need either to call it as:
>>>     >>>>>>
>>>     >>>>>>
>>>     >>>>>> myfun( mydf , "age")
>>>     >>>>>>
>>>     >>>>>>
>>>     >>>>>> # Or:
>>>     >>>>>>
>>>     >>>>>> age <- "age"
>>>     >>>>>> myfun( mydf, age)
>>>     >>>>>>
>>>     >>>>>> Unless your value of the `age`-named variable was "age" in
>>>    the calling
>>>     >>>>>> environment (and you did not give us that value in either of
>>>    your postings),
>>>     >>>>>> you would fail.
>>>     >>>>>>
>>>     >>>>>
>>>     >>>>> ______________________________________________
>>>     >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>>>    list -- To UNSUBSCRIBE and more, see
>>>     >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>     >>>>> PLEASE do read the posting guide
>>>    http://www.R-project.org/posting-guide.html
>>>    <http://www.R-project.org/posting-guide.html>
>>>     >>>>> and provide commented, minimal, self-contained, reproducible
>>>    code.
>>>
>>>    Confidentiality Statement:
>>>    This email message, including any attachments, is for the sole use
>>>    of the intended recipient(s) and may contain confidential and
>>>    privileged information. Any unauthorized use, disclosure or
>>>    distribution is prohibited. If you are not the intended recipient,
>>>    please contact the sender by reply email and destroy all copies of
>>>    the original message.
>>>    ______________________________________________
>>>    R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>>>    To UNSUBSCRIBE and more, see
>>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>>    <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>    PLEASE do read the posting guide
>>>    http://www.R-project.org/posting-guide.html
>>>    <http://www.R-project.org/posting-guide.html>
>>>    and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list