[R] selecting dataframe columns based on substring of col name(s)
Evan Cooch
evan.cooch at gmail.com
Wed Jun 21 18:11:10 CEST 2017
Suppose I have the following sort of dataframe, where each column name
has a common structure: prefix, followed by a number (for this example,
col1, col2, col3 and col4):
d = data.frame( col1=runif(10), col2=runif(10),
col3=runif(10),col4=runif(10))
What I haven't been able to suss out is how to efficiently
'extract/manipulate/play with' columns from the data frame, making use
of this common structure.
Suppose, for example, I want to 'work with' col2, col3, and col4. Now, I
could subset the dataframe d in any number of ways -- for example
piece <- d[,c("col2","col3","col4")]
Works as expected, but for *big* problems (where I might have dozens ->
hundreds of columns -- often the case with big design matrices output by
some linear models program or another), having to write them all out
using c("col2","col3",...."colXXXXX") takes a lot of time. What I'm
wondering about is if there is a way to simply select over the "changing
part" of the column name (you can do this relatively easily in a data
step in SAS, for example). Heuristically, something like:
piece <- df[,col2:col4]
where the heuristic col2:col4 is interpreted as col2 -> col4 (parse the
prefix 'col', and then simply select over the changing suffic -- i.e.,
column number).
Now, if I use the "to" function in the lessR package, I can get there
from here fairly easily:
piece <- d[,to("col",4,from=2,same.size=FALSE)]
But, is there a better way? Beyond 'efficiency' (ease of
implementation), part of what constitutes 'better' might be something in
base R, rather than relying on a package?
Thanks in advance...
More information about the R-help
mailing list