[R] programming: telling a function where to look for the entered variables

Fri Apr 1 14:28:23 CEST 2011

Thanks Nick and Juan for your replies.

Nick, thanks for pointing out the warning in subset(). I'm not sure
though I understand the example you provided -- because despite using
subset() rather than bracket notation, the original function (myfunct)
does what is expected of it. The problem I have is with the second
function (myfunct.better), where variable names + dataframe are not
fixed within the function but passed to the function when calling it
-- and even with bracket notation I don't quite manage to tell R where
to look for the columns that related to the entered column names.
(but then perhaps I misunderstood you)

This is what I tried (using bracket notation):

myfunct.better(dataframe, subgroup, lvarname,yvarname){
Data.tmp <- dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup,
c("xvar",deparse(substitute(yvarname)))]
}

but this creates an empty contingency table only -- perhaps because my
use of deparse() is flawed (I think what is converted into a string is
"lvarname" and "yvarname", rather than the column names that these two
function-variables represent in the dataframe)?

2011/4/1 Nick Sabbe <nick.sabbe at ugent.be>:
> See the warning in ?subset.
> Passing the column name of lvar is not the same as passing the 'contextual
> column' (as I coin it in these circumstances).
> You can solve it by indeed using [] instead.
>
> For my own comfort, here is the relevant line from your original function:
> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
> Which should become something like (untested but should be close):
> Data.tmp <- Fulldf[Fulldf[,"lvar"]==subgroup, c("xvar","yvar")]
>
> This should be a lot easier to translate based on column names, as the
> column names are now used as such.
>
> HTH,
>
>
> Nick Sabbe
> --
> ping: nick.sabbe at ugent.be
> link: http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
>
> -- Do Not Disapprove
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of E Hofstadler
> Sent: vrijdag 1 april 2011 13:09
> To: r-help at r-project.org
> Subject: [R] programming: telling a function where to look for the entered
> variables
>
> Hi there,
>
> Could someone help me with the following programming problem..?
>
> I have written a function that works for my intended purpose, but it
> is quite closely tied to a particular dataframe and the names of the
> variables in this dataframe. However, I'd like to use the same
> function for different dataframes and variables. My problem is that
> I'm not quite sure how to tell my function in which dataframe the
> entered variables are located.
>
> Here's some reproducible data and the function:
>
> # create reproducible data
> set.seed(124)
> xvar <- sample(0:3, 1000, replace = T)
> yvar <- sample(0:1, 1000, replace=T)
> zvar <- rnorm(100)
> lvar <- sample(0:1, 1000, replace=T)
> Fulldf <- as.data.frame(cbind(xvar,yvar,zvar,lvar))
> Fulldf$xvar <- factor(xvar, labels=c("blue","green","red","yellow"))
> Fulldf$yvar <- factor(yvar, labels=c("area1","area2"))
> Fulldf$lvar <- factor(lvar, labels=c("yes","no"))
>
> and here's the function in the form that it currently works: from a
> subset of the dataframe Fulldf, a contingency table is created (in my
> actual data, several other operations are then performed on that
> contingency table, but these are not relevant for the problem in
> question, therefore I've deleted it) .
>
> # function as it currently works: tailored to a particular dataframe
> (Fulldf)
>
> myfunct <- function(subgroup){ # enter a particular subgroup for which
> the contingency table should be calculated (i.e. a particular value of
> the factor lvar)
> Data.tmp <- subset(Fulldf, lvar==subgroup, select=c("xvar","yvar"))
> #restrict dataframe to given subgroup and two columns of the original
> dataframe
> Data.tmp <- na.omit(Data.tmp) # exclude missing values
> indextable <- table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
> return(indextable)
> }
>
> #Since I need to use the function with different dataframes and
> variable names, I'd like to be able to tell my function the name of
> the dataframe and variables it should use for calculating the index.
> This is how I tried to modify the first part of the #function, but it
> didn't work:
>
> # function as I would like it to work: independent of any particular
> dataframe or variable names (doesn't work)
>
> myfunct.better <- function(subgroup, lvarname, yvarname, dataframe){
> #enter the subgroup, the variable names to be used and the dataframe
> in which they are found
>    Data.tmp <- subset(dataframe, lvarname==subgroup, select=c("xvar",
> deparse(substitute(yvarname)))) # trying to subset the given dataframe
> for the given subgroup of the given variable. The variable "xvar"
> happens to have the same name in all dataframes) but the variable
> yvarname has different names in the different dataframes
> Data.tmp <- na.omit(Data.tmp)
>    indextable <- table(Data.tmp$xvar, Data.tmp$yvarname) # create the
> contingency table on the basis of the entered variables
> return(indextable)
> }
>
> calling
>
> myfunct.better("yes", lvarname=lvar, yvarname=yvar, dataframe=Fulldf)
>
> results in the following error:
>
> Error in `[.data.frame`(x, r, vars, drop = drop) :
>  undefined columns selected
>
> My feeling is that R doesn't know where to look for the entered
> variables (lvar, yvar), but I'm not sure how to solve this problem. I
> tried using with() and even attach() within the function, but that
> didn't work.
>
> Any help is greatly appreciated.
>
> Best,
> Esther
>
> P.S.:
> Are there books that elaborate programming in R for beginners -- and I
> mean things like how to best use vectorization instead of loops and
> general "best practice" tips for programming. Most of the books I've
> been looking at focus on applying R for particular statistical
> analyses, and only comparably briefly deal with more general
> programming aspects. I was wondering if there's any books or tutorials
> out there that cover the latter aspects in a more elaborate and
> systematic way...?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>