[Rd] Versions of PCRE, documenting what grep etc do.
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Oct 24 18:33:18 MEST 2003
On Fri, 24 Oct 2003, Kurt Hornik wrote:
> >>>>> Prof Brian Ripley writes:
>
> > A couple of weeks back there was some discussion about documenting the
> > regular expressions as used in R. Several years ago the problem was
> > that this was OS-dependent, and to plug that problem we incorporated
> > regexp code from a version of GNU grep, later updated to grep-2.4.2 in
> > R 1.2.0.
>
> > I have been looking at documenting what grep(perl=TRUE) does, and we
> > have a similar problem in that the current PCRE, 4.4, implements
> > rather more of Perl's regexps than 3.9 (which is in 1.8.0 if the OS
> > does not supply it, and RH8.0 has PCRE 3.9. Whichever version of
> > Debian is on franz has PCRE 3.4).
>
> > I could add a configure check for PCRE >= 4.0, and I think probably
> > should do that. However, my inclination is to always use the version
> > of PCRE in the R sources and thereby ensure that all builds of R have
> > the same version, the one I will document. Comments, please.
>
> I think we should in any case allow maintainers of binary packages on
> platforms with advanced package management systems to force the use of
> shared libraries the system can provide. (So the binary maintainers
> would need to verify that the system package provides the right libs and
> headers.)
>
> Not sure about the default: we typically try to use available system
> resources, unless this is bound to cause problems, and regex was of the
> latter type, afaicr.
With a configure check for >= 4.0 I am reasonably happy to have
--without-pcre as the default and allow --with-pcre at people's peril.
> > For PCRE 4.4 there is a long man page that I will use as a basis for
> > the documentation. I am inclined just to include either a text or PDF
> > version of the man page -- any preferences for which form?
>
> Depends on where you would put the docs, I think. Btw, where can 4.4 be
> found?
At the ftp site mentioned on ?grep, at least earlier this week.
> > For the non-Perl regexps it is harder, as I am unsure exactly what
> > patterns the GNU regex we have accepts. (From a problem which
> > occurred with some Sweave regexps, I think it accepts more than it is
> > intended to.) One fairly good docu source is the GNU grep man page:
> > does anyone know a better one? I had thought of writing a regexp.Rd
> > help page to which grep.Rd could refer.
>
> That would be great. Linux has a regex(7) purported to be "taken from
> Henry Spencer's regex package", which might be used as a start. The old
> GNU regex .tar.gz has a texinfo file, but does not help for what we
> need, I think.
The GNU grep 2.4.2 man page and texinfo file give me enough, except I
don't understand them well enough. (What is said about extended vs basic
expressions is unclear at best).
The Solaris 8 man pages are better and they do document POSIX regexps,
so I will use some of their ideas.
> [I recently looked for available regexp docs, but was not too
> successful.]
>
> > None of this is imminent (I am too busy) but is intended for the next
> > minor release (which may be called 1.9.0 or 2.0.0, I gather).
>
> Too bad :-(
I might try to put regexp.Rd (I have a start) in 1.8.1 then. Bu thte PCRE
stuff will need to wait for R-devel's release.
Brian
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list