[R] Subset using grepl

Kang Min ngokangmin at gmail.com
Thu Mar 17 06:40:05 CET 2011


Ok thank you!

On Mar 17, 12:12 pm, <Bill.Venab... at csiro.au> wrote:
> subset(data,grepl("[1-5]", section) & !grepl("0", section))
>
> BTW
>
> grepl("[1:5]", section)
>
> does work.  It checks for the characters 1, :, or 5.  
>
> -----Original Message-----
> From: r-help-boun... at r-project.org [mailto:r-help-boun... at r-project.org] On Behalf Of Kang Min
> Sent: Thursday, 17 March 2011 1:29 PM
> To: r-h... at r-project.org
> Subject: Re: [R]Subsetusinggrepl
>
> I have a new question, also regardinggrepl.
> I would like tosubsetrows with numbers from 1 to 5 in the section
> column, so I used
>
> subset(data,grepl("[1:5]", section))
>
> but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried
>
> subset(data,grepl("[1,2,3,4,5]", section))
>
> which worked. But I also got 10 in the dataframe as well. How can I
> exclude 10?
>
> >data
> section piece   LTc1    LTc2
> 10a     10-1    0.729095368     NA
> 10a     10-2    59.53292189     13.95612454
> 10h     10-3    0.213756661     NA
> 10i     10-4    NA      NA
> 10b     NA      NA      NA
> 10c     NA      NA      NA
> 10d     NA      NA      NA
> 10e     NA      NA      NA
> 10f     NA      NA      NA
> 10g     NA      NA      NA
> 10h     NA      NA      NA
> 10j     NA      NA      NA
> 1b      1-1     NA      NA
> 1d      1-2     29.37971303     12.79688209
> 1g      1-6     NA      7.607911603
> 1h      1-3     0.298059164     27.09896941
> 1i      1-4     25.11261782     19.87149991
> 1j      1-5     36.66969601     42.28507923
> 1a      NA      NA      NA
> 1c      NA      NA      NA
> 1e      NA      NA      NA
> 1f      NA      NA      NA
> 2a      2-1     15.98582117     10.58696146
> 2a      2-2     0.557308341     41.52650718
> 2c      2-3     14.99499024     10.0896793
> 2e      2-4     148.4530636     56.45493191
> 2f      2-5     25.27493551     12.98808577
> 2i      2-6     20.32857108     22.76075728
> 2b      NA      NA      NA
> 2d      NA      NA      NA
> 2g      NA      NA      NA
> 2h      NA      NA      NA
> 2j      NA      NA      NA
> 3a      3-1     13.36602867     11.47541439
> 3a      3-7     NA      111.9007822
> 3c      3-2     10.57406701     5.587777567
> 3d      3-3     11.73240891     10.73833651
> 3e      3-8     NA      14.54214165
> 3h      3-4     21.56072089     21.59748884
> 3i      3-5     15.42846935     16.62715409
> 3i      3-6     129.7367193     121.8206045
> 3b      NA      NA      NA
> 3f      NA      NA      NA
> 3g      NA      NA      NA
> 3j      NA      NA      NA
> 5b      5-1     18.61733498     18.13545293
> 5d      5-3     NA      7.81018526
> 5f      5-2     12.5158971      14.37884817
> 5a      NA      NA      NA
> 5c      NA      NA      NA
> 5e      NA      NA      NA
> 5g      NA      NA      NA
> 5h      NA      NA      NA
> 5i      NA      NA      NA
> 5j      NA      NA      NA
> 9h      9-1     NA      NA
> 9a      NA      NA      NA
> 9b      NA      NA      NA
> 9c      NA      NA      NA
> 9d      NA      NA      NA
> 9e      NA      NA      NA
> 9f      NA      NA      NA
> 9g      NA      NA      NA
> 9i      NA      NA      NA
> 9j      NA      NA      NA
> 8a      8-1     14.29712852     12.83178905
> 8e      8-2     23.46594953     9.097377872
> 8f      8-3     NA      NA
> 8f      8-4     22.20001584     20.39646766
> 8h      8-5     50.54497551     56.93752065
> 8b      NA      NA      NA
> 8c      NA      NA      NA
> 8d      NA      NA      NA
> 8g      NA      NA      NA
> 8i      NA      NA      NA
> 8j      NA      NA      NA
> 4b      4-1     40.83468857     35.99017683
> 4f      4-3     NA      182.8060799
> 4f      4-4     NA      36.81401955
> 4h      4-2     17.13625062     NA
> 4a      NA      NA      NA
> 4c      NA      NA      NA
> 4d      NA      NA      NA
> 4e      NA      NA      NA
> 4g      NA      NA      NA
> 4i      NA      NA      NA
> 4j      NA      NA      NA
> 7b      7-1     8.217605633     8.565035083
> 7a      NA      NA      NA
> 7c      NA      NA      NA
> 7d      NA      NA      NA
> 7e      NA      NA      NA
> 7f      NA      NA      NA
> 7g      NA      NA      NA
> 7h      NA      NA      NA
> 7i      NA      NA      NA
> 7j      NA      NA      NA
> 6b      6-6     NA      11.57887288
> 6c      6-1     27.32608984     17.17778959
> 6c      6-2     78.21988783     61.80558768
> 6d      6-7     NA      3.599685625
> 6f      6-3     26.78838281     23.33258286
> 6h      6-4     NA      NA
> 6h      6-5     NA      NA
> 6a      NA      NA      NA
> 6e      NA      NA      NA
> 6g      NA      NA      NA
> 6i      NA      NA      NA
> 6j      NA      NA      NA
>
> On Jan 29, 10:43 pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> > On Sat, 29 Jan 2011, Kang Min wrote:
> > > Thanks Prof Ripley, the condition worked!
> > > Btw I tried to search ?repl but I don't have documentation for it. Is
> > > it in a non-basic package?
>
> > I meantgrepl: the edit messed up (but not on my screen, as sometimes
> > happens when working remotely).  The point is that 'perl=TRUE'
> > guarantees that [A-J] is interpreted in ASCII order.
>
> > > On Jan 29, 6:54�pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> > >> The grep comdition is "[A-J]"
>
> > >> BTW, why there are lots of unnecessary steps here, includingusing
> > >> cbind() andsubset():
>
> > >> x <- rep(LETTERS[1:20],3)
> > >> y <- rep(1:3, 20)
> > >> z <- paste(x,y, sep="")
> > >> random.data <- rnorm(60)
> > >> data <- data.frame(z, random.data)
> > >> data[grepl("[A-J]", z), ]
>
> > >> Now (for the paranoid and not needed in this example) in general the
> > >> effect of "[A-Z]" depends on the locale, so you could write out
> > >> "[ABCDEFIJK]" or create it by
>
> > >> cond <- paste("[", paste(LETTERS[1:10], collapse=""), "]", sep="")
>
> > >> Or use repl("[A-J]", z, perl=TRUE).
>
> > >> On Sat, 29 Jan 2011, Kang Min wrote:
> > >>> Hi all,
>
> > >>> I would like tosubseta dataframe byusingpart of the level name.
>
> > >>> x <- rep(LETTERS[1:20],3)
> > >>> y <- rep(1:3, 20)
> > >>> z <- paste(x,y, sep="")
> > >>> random.data <- rnorm(60)
> > >>> data <- as.data.frame(cbind(z, random.data))
>
> > >>> I need rows that contain the letters A to J, so I tried:
>
> > >>>subset(data,grepl(LETTERS[1:10], z)) # got only rows with A
> > >>>subset(data, z %in% LETTERS[1:10]) # got no rows
>
> > >>> I think I'm getting close to the solution but need a little bit of
> > >>> help here, thanks in advance.
>
> > >>> Kang Min
>
> > >>> ______________________________________________
> > >>> R-h... at r-project.org mailing list
> > >>>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
>
> > >> --
> > >> Brian D. Ripley, � � � � � � � � �rip... at stats.ox.ac.uk
> > >> Professor of Applied Statistics, �http://www.stats.ox.ac.uk/~ripley/
> > >> University of Oxford, � � � � � � Tel: �+44 1865 272861 (self)
> > >> 1 South Parks Road, � � � � � � � � � � +44 1865 272866 (PA)
> > >> Oxford OX1 3TG, UK � � � � � � � �Fax: �+44 1865 272595
>
> > >> ______________________________________________
> > >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
>
> > > ______________________________________________
> > > R-h... at r-project.org mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> > --
> > Brian D. Ripley,                  rip... at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> > ______________________________________________
> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list