[R] Subset using grepl

Kang Min ngokangmin at gmail.com
Thu Mar 17 04:29:22 CET 2011


I have a new question, also regarding grepl.
I would like to subset rows with numbers from 1 to 5 in the section
column, so I used

subset(data, grepl("[1:5]", section))

but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried

subset(data, grepl("[1,2,3,4,5]", section))

which worked. But I also got 10 in the dataframe as well. How can I
exclude 10?

>data
section	piece	LTc1	LTc2
10a	10-1	0.729095368	NA
10a	10-2	59.53292189	13.95612454
10h	10-3	0.213756661	NA
10i	10-4	NA	NA
10b	NA	NA	NA
10c	NA	NA	NA
10d	NA	NA	NA
10e	NA	NA	NA
10f	NA	NA	NA
10g	NA	NA	NA
10h	NA	NA	NA
10j	NA	NA	NA
1b	1-1	NA	NA
1d	1-2	29.37971303	12.79688209
1g	1-6	NA	7.607911603
1h	1-3	0.298059164	27.09896941
1i	1-4	25.11261782	19.87149991
1j	1-5	36.66969601	42.28507923
1a	NA	NA	NA
1c	NA	NA	NA
1e	NA	NA	NA
1f	NA	NA	NA
2a	2-1	15.98582117	10.58696146
2a	2-2	0.557308341	41.52650718
2c	2-3	14.99499024	10.0896793
2e	2-4	148.4530636	56.45493191
2f	2-5	25.27493551	12.98808577
2i	2-6	20.32857108	22.76075728
2b	NA	NA	NA
2d	NA	NA	NA
2g	NA	NA	NA
2h	NA	NA	NA
2j	NA	NA	NA
3a	3-1	13.36602867	11.47541439
3a	3-7	NA	111.9007822
3c	3-2	10.57406701	5.587777567
3d	3-3	11.73240891	10.73833651
3e	3-8	NA	14.54214165
3h	3-4	21.56072089	21.59748884
3i	3-5	15.42846935	16.62715409
3i	3-6	129.7367193	121.8206045
3b	NA	NA	NA
3f	NA	NA	NA
3g	NA	NA	NA
3j	NA	NA	NA
5b	5-1	18.61733498	18.13545293
5d	5-3	NA	7.81018526
5f	5-2	12.5158971	14.37884817
5a	NA	NA	NA
5c	NA	NA	NA
5e	NA	NA	NA
5g	NA	NA	NA
5h	NA	NA	NA
5i	NA	NA	NA
5j	NA	NA	NA
9h	9-1	NA	NA
9a	NA	NA	NA
9b	NA	NA	NA
9c	NA	NA	NA
9d	NA	NA	NA
9e	NA	NA	NA
9f	NA	NA	NA
9g	NA	NA	NA
9i	NA	NA	NA
9j	NA	NA	NA
8a	8-1	14.29712852	12.83178905
8e	8-2	23.46594953	9.097377872
8f	8-3	NA	NA
8f	8-4	22.20001584	20.39646766
8h	8-5	50.54497551	56.93752065
8b	NA	NA	NA
8c	NA	NA	NA
8d	NA	NA	NA
8g	NA	NA	NA
8i	NA	NA	NA
8j	NA	NA	NA
4b	4-1	40.83468857	35.99017683
4f	4-3	NA	182.8060799
4f	4-4	NA	36.81401955
4h	4-2	17.13625062	NA
4a	NA	NA	NA
4c	NA	NA	NA
4d	NA	NA	NA
4e	NA	NA	NA
4g	NA	NA	NA
4i	NA	NA	NA
4j	NA	NA	NA
7b	7-1	8.217605633	8.565035083
7a	NA	NA	NA
7c	NA	NA	NA
7d	NA	NA	NA
7e	NA	NA	NA
7f	NA	NA	NA
7g	NA	NA	NA
7h	NA	NA	NA
7i	NA	NA	NA
7j	NA	NA	NA
6b	6-6	NA	11.57887288
6c	6-1	27.32608984	17.17778959
6c	6-2	78.21988783	61.80558768
6d	6-7	NA	3.599685625
6f	6-3	26.78838281	23.33258286
6h	6-4	NA	NA
6h	6-5	NA	NA
6a	NA	NA	NA
6e	NA	NA	NA
6g	NA	NA	NA
6i	NA	NA	NA
6j	NA	NA	NA


On Jan 29, 10:43 pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> On Sat, 29 Jan 2011, Kang Min wrote:
> > Thanks Prof Ripley, the condition worked!
> > Btw I tried to search ?repl but I don't have documentation for it. Is
> > it in a non-basic package?
>
> I meant grepl: the edit messed up (but not on my screen, as sometimes
> happens when working remotely).  The point is that 'perl=TRUE'
> guarantees that [A-J] is interpreted in ASCII order.
>
>
>
>
>
> > On Jan 29, 6:54�pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> >> The grep comdition is "[A-J]"
>
> >> BTW, why there are lots of unnecessary steps here, including using
> >> cbind() and subset():
>
> >> x <- rep(LETTERS[1:20],3)
> >> y <- rep(1:3, 20)
> >> z <- paste(x,y, sep="")
> >> random.data <- rnorm(60)
> >> data <- data.frame(z, random.data)
> >> data[grepl("[A-J]", z), ]
>
> >> Now (for the paranoid and not needed in this example) in general the
> >> effect of "[A-Z]" depends on the locale, so you could write out
> >> "[ABCDEFIJK]" or create it by
>
> >> cond <- paste("[", paste(LETTERS[1:10], collapse=""), "]", sep="")
>
> >> Or use repl("[A-J]", z, perl=TRUE).
>
> >> On Sat, 29 Jan 2011, Kang Min wrote:
> >>> Hi all,
>
> >>> I would like to subset a dataframe by using part of the level name.
>
> >>> x <- rep(LETTERS[1:20],3)
> >>> y <- rep(1:3, 20)
> >>> z <- paste(x,y, sep="")
> >>> random.data <- rnorm(60)
> >>> data <- as.data.frame(cbind(z, random.data))
>
> >>> I need rows that contain the letters A to J, so I tried:
>
> >>> subset(data, grepl(LETTERS[1:10], z)) # got only rows with A
> >>> subset(data, z %in% LETTERS[1:10]) # got no rows
>
> >>> I think I'm getting close to the solution but need a little bit of
> >>> help here, thanks in advance.
>
> >>> Kang Min
>
> >>> ______________________________________________
> >>> R-h... at r-project.org mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
>
> >> --
> >> Brian D. Ripley, � � � � � � � � �rip... at stats.ox.ac.uk
> >> Professor of Applied Statistics, �http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford, � � � � � � Tel: �+44 1865 272861 (self)
> >> 1 South Parks Road, � � � � � � � � � � +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UK � � � � � � � �Fax: �+44 1865 272595
>
> >> ______________________________________________
> >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > R-h... at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Brian D. Ripley,                  rip... at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list