[R] Subset using grepl

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Jan 29 11:54:46 CET 2011


The grep comdition is "[A-J]"

BTW, why there are lots of unnecessary steps here, including using 
cbind() and subset():

x <- rep(LETTERS[1:20],3)
y <- rep(1:3, 20)
z <- paste(x,y, sep="")
random.data <- rnorm(60)
data <- data.frame(z, random.data)
data[grepl("[A-J]", z), ]

Now (for the paranoid and not needed in this example) in general the 
effect of "[A-Z]" depends on the locale, so you could write out 
"[ABCDEFIJK]" or create it by

cond <- paste("[", paste(LETTERS[1:10], collapse=""), "]", sep="")

Or use repl("[A-J]", z, perl=TRUE).

On Sat, 29 Jan 2011, Kang Min wrote:

> Hi all,
>
> I would like to subset a dataframe by using part of the level name.
>
> x <- rep(LETTERS[1:20],3)
> y <- rep(1:3, 20)
> z <- paste(x,y, sep="")
> random.data <- rnorm(60)
> data <- as.data.frame(cbind(z, random.data))
>
> I need rows that contain the letters A to J, so I tried:
>
> subset(data, grepl(LETTERS[1:10], z)) # got only rows with A
> subset(data, z %in% LETTERS[1:10]) # got no rows
>
> I think I'm getting close to the solution but need a little bit of
> help here, thanks in advance.
>
> Kang Min
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list