[Rd] Versions of PCRE, documenting what grep etc do.

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Oct 24 18:33:18 MEST 2003


On Fri, 24 Oct 2003, Kurt Hornik wrote:

> >>>>> Prof Brian Ripley writes:
> 
> > A couple of weeks back there was some discussion about documenting the
> > regular expressions as used in R.  Several years ago the problem was
> > that this was OS-dependent, and to plug that problem we incorporated
> > regexp code from a version of GNU grep, later updated to grep-2.4.2 in
> > R 1.2.0.
> 
> > I have been looking at documenting what grep(perl=TRUE) does, and we
> > have a similar problem in that the current PCRE, 4.4, implements
> > rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS
> > does not supply it, and RH8.0 has PCRE 3.9. Whichever version of
> > Debian is on franz has PCRE 3.4).
> 
> > I could add a configure check for PCRE >= 4.0, and I think probably
> > should do that.  However, my inclination is to always use the version
> > of PCRE in the R sources and thereby ensure that all builds of R have
> > the same version, the one I will document.  Comments, please.
> 
> I think we should in any case allow maintainers of binary packages on
> platforms with advanced package management systems to force the use of
> shared libraries the system can provide.  (So the binary maintainers
> would need to verify that the system package provides the right libs and
> headers.)
> 
> Not sure about the default: we typically try to use available system
> resources, unless this is bound to cause problems, and regex was of the
> latter type, afaicr.  

With a configure check for >= 4.0 I am reasonably happy to have 
--without-pcre as the default and allow --with-pcre at people's peril.

> > For PCRE 4.4 there is a long man page that I will use as a basis for
> > the documentation.  I am inclined just to include either a text or PDF
> > version of the man page -- any preferences for which form?
> 
> Depends on where you would put the docs, I think.  Btw, where can 4.4 be
> found?

At the ftp site mentioned on ?grep, at least earlier this week.

> > For the non-Perl regexps it is harder, as I am unsure exactly what
> > patterns the GNU regex we have accepts.  (From a problem which
> > occurred with some Sweave regexps, I think it accepts more than it is
> > intended to.)  One fairly good docu source is the GNU grep man page:
> > does anyone know a better one?  I had thought of writing a regexp.Rd
> > help page to which grep.Rd could refer.
> 
> That would be great.  Linux has a regex(7) purported to be "taken from
> Henry Spencer's regex package", which might be used as a start.  The old
> GNU regex .tar.gz has a texinfo file, but does not help for what we
> need, I think.

The GNU grep 2.4.2 man page and texinfo file give me enough, except I 
don't understand them well enough.  (What is said about extended vs basic 
expressions is unclear at best).

The Solaris 8 man pages are better and they do document POSIX regexps,
so I will use some of their ideas.

> [I recently looked for available regexp docs, but was not too
> successful.]
> 
> > None of this is imminent (I am too busy) but is intended for the next 
> > minor release (which may be called 1.9.0 or 2.0.0, I gather).
> 
> Too bad :-(

I might try to put regexp.Rd (I have a start) in 1.8.1 then.  Bu thte PCRE 
stuff will need to wait for R-devel's release.

Brian

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list