[R] Write a function that allows access to columns of a passeddataframe.
Bert Gunter
bgunter.4567 at gmail.com
Tue Dec 6 23:00:40 CET 2016
Simpler I think: ?all.vars
> all.vars(~A+B)
[1] "A" "B"
Note also:
> all.vars(~log(A))
[1] "A"
Cheers,
Bert
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Dec 6, 2016 at 10:41 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>>
>> Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows.
>>
>> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>> mydf
>> class(mydf)
>> str(mydf)
>>
>> myfun <- function(frame,var){
>> yy <- as.character(substitute(var))
>> frame[, yy]
>> }
>>
>> myfun(mydf, age)
>> myfun(mydf, "age")
>>
>> Rui Barradas
>>
>> Em 06-12-2016 15:03, William Dunlap escreveu:
>>> I basically agree with Rui - using substitute will cause trouble. E.g., how
>>> would the user iterate over the columns, calling your function for each?
>>> for(column in dataFrame) func(column)
>>> would fail because dataFrame$column does not exist. You need to provide
>>> an extra argument to handle this case. something like the following:
>>> func <- function(df,
>>> columnAsName,,
>>> columnAsString = deparse(substitute(columnAsName))[1])
>>> ...
>>> }
>>> The default value of columnAsString should also deal with the case that
>>> the user supplied something like log(Conc.) instead of Conc.
>>>
>>> I think that using a formula for the lazily evaluated argument
>>> (columnAsName)
>>> works well. The user then knows exactly how it gets evaluated.
>
> This would be an implementation that would support a multi-column extraction using a formula object:
>
> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
> mydf
> class(mydf)
> str(mydf)
>
> myfun <- function(frame, vars){
> yy <- terms(vars)
> frame[, attr(yy, "term.labels")]
> }
>
> myfun(mydf, ~age+sex)
>
>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com <http://tibco.com>
>>>
>>> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
>>> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>>>
>>> Over my almost 50 years programming, I have come to believe that if
>>> one wants a program to be useful, one should write the program to do
>>> as much work as possible and demand as little as possible from the
>>> user of the program. In my opinion, one should not ask the person
>>> who uses my function to remember to put the name of the data frame
>>> column in quotation marks. The function should be written so that
>>> all that needs to be passed is the name of the column; the function
>>> should take care of the quotation marks.
>>> Jihny
>>>
>>> > John David Sorkin M.D., Ph.D.
>>> > Professor of Medicine
>>> > Chief, Biostatistics and Informatics
>>> > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>>> > Baltimore VA Medical Center
>>> > 10 North Greene Street
>>> > GRECC (BT/18/GR)
>>> > Baltimore, MD 21201-1524
>>> > (Phone)410-605-7119 <tel:410-605-7119>
>>> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above
>>> prior to faxing)
>>>
>>>
>>> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>>> <mailto:ruipbarradas at sapo.pt>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > Just to say that I wouldn't write the function as John did. I
>>> would get
>>> > rid of all the deparse/substitute stuff and instinctively use a
>>> quoted
>>> > argument as a column name. Something like the following.
>>> >
>>> > myfun <- function(frame, var){
>>> > [...]
>>> > col <- frame[, var] # or frame[[var]]
>>> > [...]
>>> > }
>>> >
>>> > myfun(mydf, "age") # much better, simpler, no promises.
>>> >
>>> > Rui Barradas
>>> >
>>> > Em 05-12-2016 21:49, Bert Gunter escreveu:
>>> >> Typo: "lazy evaluation" not "lay evaluation."
>>> >>
>>> >> -- Bert
>>> >>
>>> >>
>>> >>
>>> >> Bert Gunter
>>> >>
>>> >> "The trouble with having an open mind is that people keep coming
>>> along
>>> >> and sticking things into it."
>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>
>>> >>
>>> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>>> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>> >>> Sorry, hit "Send" by mistake.
>>> >>>
>>> >>> Inline.
>>> >>>
>>> >>>
>>> >>>
>>> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>>> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>>> >>>> Inline.
>>> >>>>
>>> >>>> -- Bert
>>> >>>>
>>> >>>>
>>> >>>> Bert Gunter
>>> >>>>
>>> >>>> "The trouble with having an open mind is that people keep
>>> coming along
>>> >>>> and sticking things into it."
>>> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>>>
>>> >>>>
>>> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>>> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>>> >>>>> Hello,
>>> >>>>>
>>> >>>>> Inline.
>>> >>>>>
>>> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>>> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>>> >>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Rui,
>>> >>>>>>> I appreciate your suggestion, but eliminating the deparse
>>> statement does
>>> >>>>>>> not solve my problem. Do you have any other suggestions?
>>> See code below.
>>> >>>>>>> Thank you,
>>> >>>>>>> John
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> mydf <-
>>> >>>>>>>
>>> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>>> >>>>>>> mydf
>>> >>>>>>> class(mydf)
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> myfun <- function(frame,var){
>>> >>>>>>> call <- match.call()
>>> >>>>>>> print(call)
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0)
>>> >>>>>>> print(indx)
>>> >>>>>>> if(indx[1]==0) stop("Function called without sufficient
>>> arguments!")
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> cat("I can get the name of the dataframe as a text
>>> string!\n")
>>> >>>>>>> #xx <- deparse(substitute(frame))
>>> >>>>>>> print(xx)
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> cat("I can get the name of the column as a text string!\n")
>>> >>>>>>> #yy <- deparse(substitute(var))
>>> >>>>>>> print(yy)
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> # This does not work.
>>> >>>>>>> print(frame[,var])
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> # This does not work.
>>> >>>>>>> print(frame[,"var"])
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> # This does not work.
>>> >>>>>>> col <- xx[,"yy"]
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> # Nor does this work.
>>> >>>>>>> col <- xx[,yy]
>>> >>>>>>> print(col)
>>> >>>>>>> }
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> myfun(mydf,age)
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> When you use that calling syntax, the system will supply the
>>> values of
>>> >>>>>> whatever the `age` variable contains. (And if there is no
>>> `age`-named
>>> >>>>>> object, you get an error at the time of the call to `myfun`.
>>> >>>>>
>>> >>>>>
>>> >>>>> Actually, no, which was very surprising to me but John's code
>>> worked (not
>>> >>>>> the function, the call). And with the change I've proposed,
>>> it worked
>>> >>>>> flawlessly. No errors. Why I don't know.
>>> >>>
>>> >>> See ?substitute and in particular the example highlighted there.
>>> >>>
>>> >>> The technical details are explained in the R Language Definition
>>> >>> manual. The key here is the use of promises for lay evaluations. In
>>> >>> fact, the expression in the call *is* available within the
>>> functions,
>>> >>> as is (a pointer to) the environment in which to evaluate the
>>> >>> expression. That is how substitute() works. Specifically,
>>> quoting from
>>> >>> the manual,
>>> >>>
>>> >>> *****
>>> >>> It is possible to access the actual (not default) expressions
>>> used as
>>> >>> arguments inside the function. The mechanism is implemented via
>>> >>> promises. When a function is being evaluated the actual expression
>>> >>> used as an argument is stored in the promise together with a
>>> pointer
>>> >>> to the environment the function was called from. When (if) the
>>> >>> argument is evaluated the stored expression is evaluated in the
>>> >>> environment that the function was called from. Since only a
>>> pointer to
>>> >>> the environment is used any changes made to that environment
>>> will be
>>> >>> in effect during this evaluation. The resulting value is then also
>>> >>> stored in a separate spot in the promise. Subsequent evaluations
>>> >>> retrieve this stored value (a second evaluation is not carried
>>> out).
>>> >>> Access to the unevaluated expression is also available using
>>> >>> substitute.
>>> >>> ********
>>> >>>
>>> >>> -- Bert
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>>>
>>> >>>>> Rui Barradas
>>> >>>>>
>>> >>>>> You need either to call it as:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> myfun( mydf , "age")
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> # Or:
>>> >>>>>>
>>> >>>>>> age <- "age"
>>> >>>>>> myfun( mydf, age)
>>> >>>>>>
>>> >>>>>> Unless your value of the `age`-named variable was "age" in
>>> the calling
>>> >>>>>> environment (and you did not give us that value in either of
>>> your postings),
>>> >>>>>> you would fail.
>>> >>>>>>
>>> >>>>>
>>> >>>>> ______________________________________________
>>> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>>> list -- To UNSUBSCRIBE and more, see
>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>>> >>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> <http://www.R-project.org/posting-guide.html>
>>> >>>>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>>
>>> Confidentiality Statement:
>>> This email message, including any attachments, is for the sole use
>>> of the intended recipient(s) and may contain confidential and
>>> privileged information. Any unauthorized use, disclosure or
>>> distribution is prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of
>>> the original message.
>>> ______________________________________________
>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>>> To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list