[R] Subset using grepl
Kang Min
ngokangmin at gmail.com
Thu Mar 17 04:29:22 CET 2011
I have a new question, also regarding grepl.
I would like to subset rows with numbers from 1 to 5 in the section
column, so I used
subset(data, grepl("[1:5]", section))
but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried
subset(data, grepl("[1,2,3,4,5]", section))
which worked. But I also got 10 in the dataframe as well. How can I
exclude 10?
>data
section piece LTc1 LTc2
10a 10-1 0.729095368 NA
10a 10-2 59.53292189 13.95612454
10h 10-3 0.213756661 NA
10i 10-4 NA NA
10b NA NA NA
10c NA NA NA
10d NA NA NA
10e NA NA NA
10f NA NA NA
10g NA NA NA
10h NA NA NA
10j NA NA NA
1b 1-1 NA NA
1d 1-2 29.37971303 12.79688209
1g 1-6 NA 7.607911603
1h 1-3 0.298059164 27.09896941
1i 1-4 25.11261782 19.87149991
1j 1-5 36.66969601 42.28507923
1a NA NA NA
1c NA NA NA
1e NA NA NA
1f NA NA NA
2a 2-1 15.98582117 10.58696146
2a 2-2 0.557308341 41.52650718
2c 2-3 14.99499024 10.0896793
2e 2-4 148.4530636 56.45493191
2f 2-5 25.27493551 12.98808577
2i 2-6 20.32857108 22.76075728
2b NA NA NA
2d NA NA NA
2g NA NA NA
2h NA NA NA
2j NA NA NA
3a 3-1 13.36602867 11.47541439
3a 3-7 NA 111.9007822
3c 3-2 10.57406701 5.587777567
3d 3-3 11.73240891 10.73833651
3e 3-8 NA 14.54214165
3h 3-4 21.56072089 21.59748884
3i 3-5 15.42846935 16.62715409
3i 3-6 129.7367193 121.8206045
3b NA NA NA
3f NA NA NA
3g NA NA NA
3j NA NA NA
5b 5-1 18.61733498 18.13545293
5d 5-3 NA 7.81018526
5f 5-2 12.5158971 14.37884817
5a NA NA NA
5c NA NA NA
5e NA NA NA
5g NA NA NA
5h NA NA NA
5i NA NA NA
5j NA NA NA
9h 9-1 NA NA
9a NA NA NA
9b NA NA NA
9c NA NA NA
9d NA NA NA
9e NA NA NA
9f NA NA NA
9g NA NA NA
9i NA NA NA
9j NA NA NA
8a 8-1 14.29712852 12.83178905
8e 8-2 23.46594953 9.097377872
8f 8-3 NA NA
8f 8-4 22.20001584 20.39646766
8h 8-5 50.54497551 56.93752065
8b NA NA NA
8c NA NA NA
8d NA NA NA
8g NA NA NA
8i NA NA NA
8j NA NA NA
4b 4-1 40.83468857 35.99017683
4f 4-3 NA 182.8060799
4f 4-4 NA 36.81401955
4h 4-2 17.13625062 NA
4a NA NA NA
4c NA NA NA
4d NA NA NA
4e NA NA NA
4g NA NA NA
4i NA NA NA
4j NA NA NA
7b 7-1 8.217605633 8.565035083
7a NA NA NA
7c NA NA NA
7d NA NA NA
7e NA NA NA
7f NA NA NA
7g NA NA NA
7h NA NA NA
7i NA NA NA
7j NA NA NA
6b 6-6 NA 11.57887288
6c 6-1 27.32608984 17.17778959
6c 6-2 78.21988783 61.80558768
6d 6-7 NA 3.599685625
6f 6-3 26.78838281 23.33258286
6h 6-4 NA NA
6h 6-5 NA NA
6a NA NA NA
6e NA NA NA
6g NA NA NA
6i NA NA NA
6j NA NA NA
On Jan 29, 10:43 pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> On Sat, 29 Jan 2011, Kang Min wrote:
> > Thanks Prof Ripley, the condition worked!
> > Btw I tried to search ?repl but I don't have documentation for it. Is
> > it in a non-basic package?
>
> I meant grepl: the edit messed up (but not on my screen, as sometimes
> happens when working remotely). The point is that 'perl=TRUE'
> guarantees that [A-J] is interpreted in ASCII order.
>
>
>
>
>
> > On Jan 29, 6:54�pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> >> The grep comdition is "[A-J]"
>
> >> BTW, why there are lots of unnecessary steps here, including using
> >> cbind() and subset():
>
> >> x <- rep(LETTERS[1:20],3)
> >> y <- rep(1:3, 20)
> >> z <- paste(x,y, sep="")
> >> random.data <- rnorm(60)
> >> data <- data.frame(z, random.data)
> >> data[grepl("[A-J]", z), ]
>
> >> Now (for the paranoid and not needed in this example) in general the
> >> effect of "[A-Z]" depends on the locale, so you could write out
> >> "[ABCDEFIJK]" or create it by
>
> >> cond <- paste("[", paste(LETTERS[1:10], collapse=""), "]", sep="")
>
> >> Or use repl("[A-J]", z, perl=TRUE).
>
> >> On Sat, 29 Jan 2011, Kang Min wrote:
> >>> Hi all,
>
> >>> I would like to subset a dataframe by using part of the level name.
>
> >>> x <- rep(LETTERS[1:20],3)
> >>> y <- rep(1:3, 20)
> >>> z <- paste(x,y, sep="")
> >>> random.data <- rnorm(60)
> >>> data <- as.data.frame(cbind(z, random.data))
>
> >>> I need rows that contain the letters A to J, so I tried:
>
> >>> subset(data, grepl(LETTERS[1:10], z)) # got only rows with A
> >>> subset(data, z %in% LETTERS[1:10]) # got no rows
>
> >>> I think I'm getting close to the solution but need a little bit of
> >>> help here, thanks in advance.
>
> >>> Kang Min
>
> >>> ______________________________________________
> >>> R-h... at r-project.org mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
>
> >> --
> >> Brian D. Ripley, � � � � � � � � �rip... at stats.ox.ac.uk
> >> Professor of Applied Statistics, �http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford, � � � � � � Tel: �+44 1865 272861 (self)
> >> 1 South Parks Road, � � � � � � � � � � +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UK � � � � � � � �Fax: �+44 1865 272595
>
> >> ______________________________________________
> >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > R-h... at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Brian D. Ripley, rip... at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list