[R] regexpr help (match.length=0)

Matt Shotwell shotwelm at musc.edu
Wed Jun 2 01:05:48 CEST 2010


On Tue, 2010-06-01 at 16:43 -0400, Erik Iverson wrote:
> 
> McGehee, Robert wrote:
> > R-help,
> > Sorry if this is more of a regex question than an R question. However,
> > help would be appreciated on my use of the regexpr function.
> > 
> > In the first example below, I ask for all characters (a-z) in 'abc123';
> > regexpr returns a 3-character match beginning at the first character. 
> > 
> >> regexpr("[[:alpha:]]*", "abc123")
> > [1] 1
> > attr(,"match.length")
> > [1] 3
> > 
> > However, when the text is flipped regexpr, and I ask for a match of all
> > characters in '123abc', regexpr returns a zero-character match beginning
> > at the first character. Can someone explain what a zero length match
> > means (i.e. why not return -1), and why the result isn't 4,
> > match.length=3?
> 
> It means it matches 0 characters, which is fine since you use *, which 
> means match 0 or more occurrences of the regex.  It sounds like you want 
> + instead of *.  Also see gregexpr.

Also, regular expressions try to match as early as possible. That's why
the match is at position one of length zero, and not at position four of
length three.

Matt Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

> > 
> >> regexpr("[[:alpha:]]*", "123abc")
> > [1] 1
> > attr(,"match.length")
> > [1] 0
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list