[R] how to implement string pattern extraction in R
William Dunlap
wdunlap at tibco.com
Mon Aug 23 02:40:05 CEST 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Waverley @
> Palo Alto
> Sent: Sunday, August 22, 2010 3:51 PM
> To: r-help
> Subject: Re: [R] how to implement string pattern extraction in R
>
> Thanks for the reply to pointing me to the grep functions.
>
> I have checked the readme page
> http://pbil.univ-lyon1.fr/library/base/html/grep.html before I sent
> the help request.
>
> Just don't know how to extract a substring matching a pattern out of a
> string. Can someone give me the example code similar to that in perl
> to extract the prefix out of the string.
The S language pattern matching functions are vectorized so
let's compare the S way to the vectorized version of your perl code.
I think the following is idiomatic perl:
@x=qw(AAAA.txt BBBB.qaz CCCC.txt);
@prefixes=map { if($_ =~ /(.*?)\.txt/) { $1 ; } else { "<not txt file>"; } } @x ;
print( join(", ", @prefixes), "\n") ;
^Z # or ^D on Unix
AAAA, <not txt file>, CCCC
The S equivalent to the @x=qw(...) would be
> x <- c("AAAA.txt", "BBBB.qaz", "CCCC.txt")
and to get the part before the ".txt", if there is a ".txt" at
the end you could do one of
> ifelse(grepl("\\.txt$", x),
sub(pattern="\\.txt$",replacement="",x),
"<not txt file>")
[1] "AAAA" "<not txt file>" "CCCC"
or
> ifelse((r <- regexpr("\\.txt$", x))>0,
substring(x, 1, attr(r, "match.length")),
"<not txt file>")
[1] "AAAA" "<not txt file>" "CCCC"
perl's =~ has a return value that says if there was a match
or not and it stores the details of the match in the magic
variables $1, $2, ... (and $', $`, and $&). S language
functions don't use magic variables but can store the
extra stuff as attributes of the return value.
(The above use core R or S+ functions. The gsubfn package
offers more possibilities.)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> Thanks much.
>
> On Sun, Aug 22, 2010 at 3:05 PM, Waverley @ Palo Alto
> <waverley.paloalto at gmail.com> wrote:
> > Hi,
> >
> > In perl, to get a substring matching a particular pattern can be
> > implemented like the following example:
> >
> > $x = "AAAA.txt";
> > if ($x=~ /(.*?)\.txt/){
> > $prefix = $1;
> > }
> >
> > So how to do the same thing in R?
> >
> > Can someone provide me the code sample?
> >
> > Thanks much in advance.
> >
> > --
> > Waverley @ Palo Alto
> >
>
>
>
> --
> Waverley @ Palo Alto
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list