[R] features that go away and indirection

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Fri Feb 25 18:40:07 CET 2022


I have had to do some things indirectly lately and keep finding ways to do them that seem to be deprecated. Searching the Internet with key words often gets you info that is out of date but may still work.

Yes, much of this happen with packages outside base R. in particular, I am wanting to do things like make multiple plots using the ggplot2 package while varying some of the parameters using character strings such as binding a feature to a specific column dynamically. I have created some rather odd ways to get it done but prefer doing.
 things in a more accepted and reproducible manner.

So, yes, I can find a set of functions meant to replace the aes() aesthetic constructor used in ggplot with aes_() or aes_string() but they are now deprecated in favor of a new set of selecting functions that may not be as useful elsewhere.

Can someone point me to info on how to interpolate things in standard R that can reliable convert some things into the right context?

As an example, if you have a data.frame mydf with columns called alpha/beta/gamma, the construct mydf$alpha works but if I set

x <- "alpha"

then mydf$x fails.

The answer for this scenario is to use the bracket operators as in:

mydf[x]
mydf[[x]

depending on whether you want a data.frame or a vector. Or the somewhat clunkier versions like:

`[`(mydf, x)
`[[`(mydf, x)

But the wonderful features of R an also be terribly frustrating when they seemingly work against you. I am talking about things like delayed evaluation. If I want to something seemingly simply like take a subset of the columns of a data.frame and plot each column as a new graph of some kind, how does one do it?

Assume you are in middle of a loop and the index variable is called "ind" and sequentially takes on the name of the columns you want.

One solution I have used is to make a modified copy of the data where there is always a new column made that has the data you want from the current "ind' column that has a fixed name like do_me and then the ggplot (or whatever) command has hardcoded the info to make a plot of do_me. The other strings that mention the name of the variable such as on an axis label, are easier to adjust. This solution can work but it seems forced and does not necessarily generalize well.

An even weirder solution is to use techniques to generate a little program section as a character string or written to a file and then execute it or source it. Again, it works well, especially if I use something like the glue package to interpolate needed things. But it just feels like much more work than is needed.

Again, this is not a problem specific to packages like the tidyverse for me. It is about making sure I can control what is interpolated into my code at the time of my choosing. Some packages come with all kinds of tools that allow various kinds of misdirection or indirection and in some contexts that is fine. As an example, there are accessory functions in the tidyverse that can be used to select columns dynamically as in starts_with("SMALL") or matches("^EXACTLY$") that might allow me to do things there but how do I get that functionality in middle of ggplot?

I know this is not a new issue and I have read widely and may already have seen some of the methods used such as using substitute() and parse() and others in some combination that may convert a string or variable containing a string into a "name' or other internal R structure. I am wondering if someone can point me in that direction.

Avi



More information about the R-help mailing list