[R] Write a function that allows access to columns of a passeddataframe.
David Winsemius
dwinsemius at comcast.net
Tue Dec 6 19:41:52 CET 2016
> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
> Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows.
>
> mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
> mydf
> class(mydf)
> str(mydf)
>
> myfun <- function(frame,var){
> yy <- as.character(substitute(var))
> frame[, yy]
> }
>
> myfun(mydf, age)
> myfun(mydf, "age")
>
> Rui Barradas
>
> Em 06-12-2016 15:03, William Dunlap escreveu:
>> I basically agree with Rui - using substitute will cause trouble. E.g., how
>> would the user iterate over the columns, calling your function for each?
>> for(column in dataFrame) func(column)
>> would fail because dataFrame$column does not exist. You need to provide
>> an extra argument to handle this case. something like the following:
>> func <- function(df,
>> columnAsName,,
>> columnAsString = deparse(substitute(columnAsName))[1])
>> ...
>> }
>> The default value of columnAsString should also deal with the case that
>> the user supplied something like log(Conc.) instead of Conc.
>>
>> I think that using a formula for the lazily evaluated argument
>> (columnAsName)
>> works well. The user then knows exactly how it gets evaluated.
This would be an implementation that would support a multi-column extraction using a formula object:
mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)
myfun <- function(frame, vars){
yy <- terms(vars)
frame[, attr(yy, "term.labels")]
}
myfun(mydf, ~age+sex)
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com <http://tibco.com>
>>
>> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu
>> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>>
>> Over my almost 50 years programming, I have come to believe that if
>> one wants a program to be useful, one should write the program to do
>> as much work as possible and demand as little as possible from the
>> user of the program. In my opinion, one should not ask the person
>> who uses my function to remember to put the name of the data frame
>> column in quotation marks. The function should be written so that
>> all that needs to be passed is the name of the column; the function
>> should take care of the quotation marks.
>> Jihny
>>
>> > John David Sorkin M.D., Ph.D.
>> > Professor of Medicine
>> > Chief, Biostatistics and Informatics
>> > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>> > Baltimore VA Medical Center
>> > 10 North Greene Street
>> > GRECC (BT/18/GR)
>> > Baltimore, MD 21201-1524
>> > (Phone)410-605-7119 <tel:410-605-7119>
>> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above
>> prior to faxing)
>>
>>
>> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt
>> <mailto:ruipbarradas at sapo.pt>> wrote:
>> >
>> > Hello,
>> >
>> > Just to say that I wouldn't write the function as John did. I
>> would get
>> > rid of all the deparse/substitute stuff and instinctively use a
>> quoted
>> > argument as a column name. Something like the following.
>> >
>> > myfun <- function(frame, var){
>> > [...]
>> > col <- frame[, var] # or frame[[var]]
>> > [...]
>> > }
>> >
>> > myfun(mydf, "age") # much better, simpler, no promises.
>> >
>> > Rui Barradas
>> >
>> > Em 05-12-2016 21:49, Bert Gunter escreveu:
>> >> Typo: "lazy evaluation" not "lay evaluation."
>> >>
>> >> -- Bert
>> >>
>> >>
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep coming
>> along
>> >> and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>
>> >>
>> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
>> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>> >>> Sorry, hit "Send" by mistake.
>> >>>
>> >>> Inline.
>> >>>
>> >>>
>> >>>
>> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
>> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote:
>> >>>> Inline.
>> >>>>
>> >>>> -- Bert
>> >>>>
>> >>>>
>> >>>> Bert Gunter
>> >>>>
>> >>>> "The trouble with having an open mind is that people keep
>> coming along
>> >>>> and sticking things into it."
>> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>>
>> >>>>
>> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
>> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> Inline.
>> >>>>>
>> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
>> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Rui,
>> >>>>>>> I appreciate your suggestion, but eliminating the deparse
>> statement does
>> >>>>>>> not solve my problem. Do you have any other suggestions?
>> See code below.
>> >>>>>>> Thank you,
>> >>>>>>> John
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> mydf <-
>> >>>>>>>
>> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
>> >>>>>>> mydf
>> >>>>>>> class(mydf)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> myfun <- function(frame,var){
>> >>>>>>> call <- match.call()
>> >>>>>>> print(call)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0)
>> >>>>>>> print(indx)
>> >>>>>>> if(indx[1]==0) stop("Function called without sufficient
>> arguments!")
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> cat("I can get the name of the dataframe as a text
>> string!\n")
>> >>>>>>> #xx <- deparse(substitute(frame))
>> >>>>>>> print(xx)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> cat("I can get the name of the column as a text string!\n")
>> >>>>>>> #yy <- deparse(substitute(var))
>> >>>>>>> print(yy)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> # This does not work.
>> >>>>>>> print(frame[,var])
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> # This does not work.
>> >>>>>>> print(frame[,"var"])
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> # This does not work.
>> >>>>>>> col <- xx[,"yy"]
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> # Nor does this work.
>> >>>>>>> col <- xx[,yy]
>> >>>>>>> print(col)
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> myfun(mydf,age)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> When you use that calling syntax, the system will supply the
>> values of
>> >>>>>> whatever the `age` variable contains. (And if there is no
>> `age`-named
>> >>>>>> object, you get an error at the time of the call to `myfun`.
>> >>>>>
>> >>>>>
>> >>>>> Actually, no, which was very surprising to me but John's code
>> worked (not
>> >>>>> the function, the call). And with the change I've proposed,
>> it worked
>> >>>>> flawlessly. No errors. Why I don't know.
>> >>>
>> >>> See ?substitute and in particular the example highlighted there.
>> >>>
>> >>> The technical details are explained in the R Language Definition
>> >>> manual. The key here is the use of promises for lay evaluations. In
>> >>> fact, the expression in the call *is* available within the
>> functions,
>> >>> as is (a pointer to) the environment in which to evaluate the
>> >>> expression. That is how substitute() works. Specifically,
>> quoting from
>> >>> the manual,
>> >>>
>> >>> *****
>> >>> It is possible to access the actual (not default) expressions
>> used as
>> >>> arguments inside the function. The mechanism is implemented via
>> >>> promises. When a function is being evaluated the actual expression
>> >>> used as an argument is stored in the promise together with a
>> pointer
>> >>> to the environment the function was called from. When (if) the
>> >>> argument is evaluated the stored expression is evaluated in the
>> >>> environment that the function was called from. Since only a
>> pointer to
>> >>> the environment is used any changes made to that environment
>> will be
>> >>> in effect during this evaluation. The resulting value is then also
>> >>> stored in a separate spot in the promise. Subsequent evaluations
>> >>> retrieve this stored value (a second evaluation is not carried
>> out).
>> >>> Access to the unevaluated expression is also available using
>> >>> substitute.
>> >>> ********
>> >>>
>> >>> -- Bert
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>>
>> >>>>> Rui Barradas
>> >>>>>
>> >>>>> You need either to call it as:
>> >>>>>>
>> >>>>>>
>> >>>>>> myfun( mydf , "age")
>> >>>>>>
>> >>>>>>
>> >>>>>> # Or:
>> >>>>>>
>> >>>>>> age <- "age"
>> >>>>>> myfun( mydf, age)
>> >>>>>>
>> >>>>>> Unless your value of the `age`-named variable was "age" in
>> the calling
>> >>>>>> environment (and you did not give us that value in either of
>> your postings),
>> >>>>>> you would fail.
>> >>>>>>
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
>> list -- To UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> >>>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> >>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for the sole use
>> of the intended recipient(s) and may contain confidential and
>> privileged information. Any unauthorized use, disclosure or
>> distribution is prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of
>> the original message.
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>> To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list