[R] selecting dataframe columns based on substring of col name(s)

Wed Jun 21 21:08:50 CEST 2017

Assume there 100 columns, named col1, col2,..., col100 in data frame d
+ maybe some more columns with various names preceding them. You want
col21 to col72.

nm <- names(d)
d[, which(nm == "col21"): which(nm == "col72") ]

## NB : if all you have is col1 to col100 the d[, 23:72] works fine.

See any good tutorial on R for how to index matrix like structures in R.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Jun 21, 2017 at 9:11 AM, Evan Cooch <evan.cooch at gmail.com> wrote:
> Suppose I have the following sort of dataframe, where each column name has a
> common structure: prefix, followed by a number (for this example, col1,
> col2, col3 and col4):
>
>  d = data.frame( col1=runif(10), col2=runif(10),
> col3=runif(10),col4=runif(10))
>
> What I haven't been able to suss out is how to efficiently
> 'extract/manipulate/play with' columns from the data frame, making use of
> this common structure.
>
> Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I
> could subset the dataframe d in any number of ways -- for example
>
> piece <- d[,c("col2","col3","col4")]
>
> Works as expected, but for *big* problems (where I might have dozens ->
> hundreds of columns -- often the case with big design matrices output by
> some linear models program or another), having to write them all out using
> c("col2","col3",...."colXXXXX") takes a lot of time. What I'm wondering
> about is if there is a way to simply select over the "changing part" of the
> column name (you can do this relatively easily in a data step in SAS, for
> example). Heuristically, something like:
>
> piece <- df[,col2:col4]
>
> where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the
> prefix 'col', and then simply select over the changing suffic -- i.e.,
> column number).
>
> Now, if I use the "to" function in the lessR package, I can get there from
> here fairly easily:
>
> piece <- d[,to("col",4,from=2,same.size=FALSE)]
>
> But, is there a better way? Beyond 'efficiency' (ease of implementation),
> part of what constitutes 'better' might be something in base R, rather than
> relying on a package?
>
> Thanks in advance...
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.