[R] How to get multiple partial matches?

Thu Sep 7 02:01:38 CEST 2006

Try using 'grep' and regular expressions:

> x <- "72     5S_F_1            501          567
+ 7700   5S_F_2            338          611
+ 7517   5S_F_3            412          467
+ 10687  5S_F_4            380          428
+ 4870   5S_F_5            315          368
+ 6035   5S_F_6            300          359
+ 3826   5S_F_7            350          386
+ 8754   5S_F_8            450          473
+ 6399   5S_F_9            439          494
+ 749   5S_F_10            334          384
+ "
> df <- read.table(textConnection(x))
> df
      V1      V2  V3  V4
1     72  5S_F_1 501 567
2   7700  5S_F_2 338 611
3   7517  5S_F_3 412 467
4  10687  5S_F_4 380 428
5   4870  5S_F_5 315 368
6   6035  5S_F_6 300 359
7   3826  5S_F_7 350 386
8   8754  5S_F_8 450 473
9   6399  5S_F_9 439 494
10   749 5S_F_10 334 384
> # select only ones with '5S_F_1'
> df[grep('5S_F_1', as.character(df$V2)),]
    V1      V2  V3  V4
1   72  5S_F_1 501 567
10 749 5S_F_10 334 384
>
>

On 9/6/06, Sarah Tucker <sltucker15 at yahoo.com> wrote:
> Hi,
>
> I'm very new to R, and am not at all a software
> programmer of any sort.    I appreciate any help you
> may have.  I have figured out how to get my data into
> a dataframe and order it alphabetically according to a
> particular column.  Now, I would like to seperate out
> certain rows based on partial character matches.  Here
> is an (extremely) abreviated example of my data set
>
>        Probe Ch1 Median - B Ch1 Mean - B
> 72     5S_F_1            501          567
> 7700   5S_F_2            338          611
> 7517   5S_F_3            412          467
> 10687  5S_F_4            380          428
> 4870   5S_F_5            315          368
> 6035   5S_F_6            300          359
> 3826   5S_F_7            350          386
> 8754   5S_F_8            450          473
> 6399   5S_F_9            439          494
> 749   5S_F_10            334          384
>
> I would like to be able to select out all rows with,
> for example, "5S_F_" in the Probe column (there are
> non-"5S_F_" containing values in the real, larger data
> set).
>
> I think pmatch does this for instances where there is
> only 1 match, but I would like to recover all the
> matches.  I have tried to use charmatch, match,
> pmatch, agrep and grep for this purpose, but with no
> luck.
>
> When I grep for "5S_F_" with value = T, I get
> "character(0)"
> Adding wildcards (either "*" or ".") does not change
> this outcome.
>
> I thought maybe the underscores were messing it up, so
> I tried to grep "5S*" with value = T, and I get a long
> list of numbers back
>
> [1] "55"   "95"   "56"   "57"   "58"   "59"   "65"
> "75"   "85"   "105"
>  [11] "115"  "125"  "135"  "5"    "5"    "5"    "5"
>  "5"    "5"    "5"
>
> These numbers make no sense to me.  They don't seem to
> correlate with where the "5S"'s occur in the
> dataframe, and they don't look like any values in the
> Probe column (there are no numeric vaules in the Probe
> column, just strings of character digit combinations).
>
> How can I select out all the rows with the same
> partial character match?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?