[Rd] a taste of regex in `[.data.frame`
Romain Francois
rfrancois at mango-solutions.com
Fri Apr 27 11:30:16 CEST 2007
Hello,
I am often asked how to filter lines from a data frame, like for example
get all the Mazda cars from mtcars, so that usually does the trick:
R> mtcars[ grep("Mazda", rownames(mtcars)) , ]
but, what about using a formula in `[.data.frame` to make that sort of
code meaningful:
# rownames matching "Mazda"
R> mtcars[~Mazda, ]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# two first lines of iris, with the columns matching the regex "Sep"
R> iris[1:2, ~Sep]
Sepal.Length Sepal.Width
1 5.1 3.5
2 4.9 3.0
# two first lines of iris, with the columns matching the regex "\\."
R> iris[1:2, ~"\\."]
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
It is just a matter of adding these lines to `[.data.frame` after the
second line :
<code>
if(!missing(j) && inherits(j, "formula"))
j <- grep( j[[length(j)]], names(x) )
if(!missing(i) && inherits(i, "formula"))
i <- grep( i[[length(i)]], if(Narg>=3) rownames(x) else names(x))
</code>
I realize that there are also other places where this could be used, an
obvious one being `[<-.data.frame`, but i wanted to ask first if it can
be interesting, or if it is dangerous, confusing, ....
Cheers,
Romain
PS: .. can also imagine to add a minus before the regex to drop
(rows|columns) instead of keeping them.
--
Mango Solutions
data analysis that delivers
Tel: +44(0) 1249 467 467
Fax: +44(0) 1249 467 468
Mob: +44(0) 7813 526 123
More information about the R-devel
mailing list