[R] selecting dataframe columns based on substring of col name(s)

Bert Gunter bgunter.4567 at gmail.com
Wed Jun 21 21:08:50 CEST 2017

Assume there 100 columns, named col1, col2,..., col100 in data frame d
+ maybe some more columns with various names preceding them. You want
col21 to col72.

nm <- names(d)
d[, which(nm == "col21"): which(nm == "col72") ]

## NB : if all you have is col1 to col100 the d[, 23:72] works fine.

See any good tutorial on R for how to index matrix like structures in R.


Bert Gunter

On Wed, Jun 21, 2017 at 9:11 AM, Evan Cooch <evan.cooch at gmail.com> wrote:
> Suppose I have the following sort of dataframe, where each column name has a
> common structure: prefix, followed by a number (for this example, col1,
> col2, col3 and col4):
>  d = data.frame( col1=runif(10), col2=runif(10),
> col3=runif(10),col4=runif(10))
> What I haven't been able to suss out is how to efficiently
> 'extract/manipulate/play with' columns from the data frame, making use of
> this common structure.
> Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I
> could subset the dataframe d in any number of ways -- for example
> piece <- d[,c("col2","col3","col4")]
> Works as expected, but for *big* problems (where I might have dozens ->
> hundreds of columns -- often the case with big design matrices output by
> some linear models program or another), having to write them all out using
> c("col2","col3",...."colXXXXX") takes a lot of time. What I'm wondering
> about is if there is a way to simply select over the "changing part" of the
> column name (you can do this relatively easily in a data step in SAS, for
> example). Heuristically, something like:
> piece <- df[,col2:col4]
> where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the
> prefix 'col', and then simply select over the changing suffic -- i.e.,
> column number).
> Now, if I use the "to" function in the lessR package, I can get there from
> here fairly easily:
> piece <- d[,to("col",4,from=2,same.size=FALSE)]
> But, is there a better way? Beyond 'efficiency' (ease of implementation),
> part of what constitutes 'better' might be something in base R, rather than
> relying on a package?
> Thanks in advance...
