[R] Regular expressions, genbank
arun
smartpink111 at yahoo.com
Thu Feb 6 19:55:41 CET 2014
Hi,
One way would be:
vec1 <- c("CDS 3300..4037", "CDS complement(3300..4037)", "CDS 3300<..4037", "CDS join(21467..26641,27577..28890)", "CDS complement(join(30708..31700,31931..31984))", "CDS 3300<..>4037")
library(stringr)
as.numeric(unlist(strsplit(str_trim(gsub("\\D+"," ",gsub("\\d+<|>\\d+","",vec1)))," ")))
# [1] 3300 4037 3300 4037 4037 21467 26641 27577 28890 30708 31700 31931
#[13] 31984
A.K.
Hi,
I have been using R for the past 1.5 years and usually have
found topics to be relatively easy to learn on your own, but I am
finding the learning curve with the regular expressions to be a little
steep especially since I haven't found any good tutorials. While I
intend to spend more time systematically learning proper ways of making
regular expressions, I have a project that is coming due and can't wait
for that so I was hoping to get some direct help.
I need to extract all the numbers in lines with following formats:
"CDS 3300..4037"
or
"CDS complement(3300..4037)"
or
"CDS join(21467..26641,27577..28890)"
or
"CDS complement(join(30708..31700,31931..31984))"
but not if any of the numbers are preceded by "<" or followed by ">"
Many thanks in advance!
More information about the R-help
mailing list