[R] gsub: replacing a.*a if no occurence of b in .*
skiadas at hanover.edu
Sat Feb 24 17:24:34 CET 2007
All these methods do assume that you don't have nested <tag>'s, like so:
<tag><tag>foo</tag>useful stuff</tag>some garbage</tag>
For that you would really need a true parser. So I would double-check
to make sure this doesn't happen.
Do you have any control on where those XML files are generated
though? It sounds to me it might be easier to fix the utility
generating those XML files, since it clearly is doing something wrong.
On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote:
> I assume <tag> is known.
> This removes any occurrence </tag>.*</tag> where .* does not
> contain <tag> or </tag>.
> The regular expression, re, matches </tag>, then does a greedy
> match (?U) for anything followed by </tag> but uses a zero
> width lookahead subexpression (?=...) for the second </tag>
> so that it it can be rematched again. gsubfn in package
> gsubfn is like the usual gsub except that instead of
> replacing the match with a string it passes the match
> to function f and then replaces the match with the output
> of f. See the gsubfn home page:
> and vignette.
Department of Mathematics and Computer Science
More information about the R-help