[R] lines those not started with "rs"

Bert Gunter bgunter.4567 at gmail.com
Mon Jan 30 20:18:36 CET 2017


Rui, et. al.:

**IF** the data set can be read into R (3e6 lines x ?bytes/line ??) ,
then I think for a completely specified regular pattern such as that
described by the OP, grep would be a bit inefficient. If x is a vector
of strings, and you wish to remove all those that don't begin with
"rs" then:

 x[!substring(x,1,2) == "rs"]

took about 1/2 the time on my computer as the grepl() version for a
vector,x, of length 1e6.

To be fair, I suspect this may be a negigible difference, as most of
the time would probably be taken in extracting and replacing rows from
the data frame. Nevertheless, it seems worthwhile to highlight the use
of simple, efficient, albeit limited, tools when they *can* be used.

All, of course, assuming I have understood the query correctly.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jan 30, 2017 at 8:59 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Try to study the following example.
>
> A <- c("rs10000056", "rs10000076", "ab1234567")
> x <- 1:3
> dat <- data.frame(A, x)
>
> inx <- grepl("^rs", dat$A)
> dat[!inx, ]
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 30-01-2017 14:23, greg holly escreveu:
>>
>> Hi all;
>>
>> I have a file which has about 3.000.000 lines. Most of the lines at first
>> column start with "rs", for example, rs10000056, rs10000076 and so on. I
>> would like to get the lines which do not start with "rs" . Your helps
>> highly appreciated.
>>
>> Regards,
>>
>> Greg
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list