[R] Regex matching that gives byte offset?

Johannes Graumann johannes_graumann at web.de
Mon Nov 2 23:01:45 CET 2009


On Monday 02 November 2009 13:41:45 Prof Brian Ripley wrote:
> On Mon, 2 Nov 2009, Johannes Graumann wrote:
> > Hmmm ... that should do it, thanks. But how would one use this on a file
> > without reading it into memory completely?
> 
> ?file, ?readLines, ?readBin
> 
> will tell you about connections.
... all of which I only get to read by the line and a regexpr on that will not 
give me the absolute offset.
"grep -buo" on the unix command line is really fast for this. If I can't find 
the native R equivalent, I'm of a mind to do this via a sys call - ugly and 
not portable, but SOOO fast ... is it possible in R?

Joh

> 
> > Joh
> >
> > On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote:
> >> Do you mean like regexpr() (on the same help page)?
> >>
> >> Depending on your locale, you might actually prefer the character
> >> offset: if you want to match in a MBCS and have byte offsets you will
> >> need to work a bit harder if useBytes=TRUE is not sufficient for you.
> >>
> >> On Wed, 28 Oct 2009, Johannes Graumann wrote:
> >>> Hi,
> >>>
> >>> Is there any way of doing 'grep' ore something like it on the content
> >>> of a text file and extract the byte positioning of the match in the
> >>> file? I'm facing the need to access rather largish (>600MB) XML files
> >>> and would like to be able to index them ...
> >>>
> >>> Thanks for any help or flogging,
> >>>
> >>> Joh
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html and provide commented,
> >>> minimal, self-contained, reproducible code.
>




More information about the R-help mailing list