[Rd] a taste of regex in `[.data.frame`

Fri Apr 27 11:30:16 CEST 2007

Hello,

I am often asked how to filter lines from a data frame, like for example
get all the Mazda cars from mtcars, so that usually does the trick:

R> mtcars[ grep("Mazda", rownames(mtcars)) ,  ]

but, what about using a formula in `[.data.frame` to make that sort of
code meaningful:

# rownames matching "Mazda"
R> mtcars[~Mazda, ]
               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4

# two first lines of iris, with the columns matching the regex "Sep"
R> iris[1:2, ~Sep]
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0

# two first lines of iris, with the columns matching the regex "\\."
R> iris[1:2, ~"\\."]
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2

It is just a matter of adding these lines to `[.data.frame` after the
second line :

<code>
    if(!missing(j) && inherits(j, "formula")) 
       j <- grep( j[[length(j)]], names(x) )

    if(!missing(i) && inherits(i, "formula")) 
       i <- grep( i[[length(i)]], if(Narg>=3) rownames(x) else names(x))  
</code>

I realize that there are also other places where this could be used, an
obvious one being `[<-.data.frame`, but i wanted to ask first if it can
be interesting, or if it is dangerous, confusing, ....

Cheers,

Romain

PS: .. can also imagine to add a minus before the regex to drop
(rows|columns) instead of keeping them.

-- 
Mango Solutions
data analysis that delivers

Tel:  +44(0) 1249 467 467
Fax:  +44(0) 1249 467 468
Mob:  +44(0) 7813 526 123